Data retrieval from web pages using C#, ASP.NET, regular expression
$30-5000 USD
Completed
Posted over 17 years ago
$30-5000 USD
Paid on delivery
We developed a business directory the web pages we created are still available, but the source data which created them is missing. We need to retrieve the source data from the web pages, by parsing the source code of the HTML files.
You will develop a piece of software in C#, which will open a given input file containing the source HTML and parse it using one or more regular expressions.
It will send the resultant data to a Microsoft Access database. Not all of the data is present on every page. For example Address 2 is present on some pages but not on others. So the regular expressions have to be very accurate.
It will open the file from the file system, not from the web. It does not need to loop through a series of input files, although we may enhance it to do so. The software need only open one source file at a time.
The following information must be extracted:
Company name
Address
Address 2
Locality
Town
County
Postcode
Region
Type of Business
SIC Code
Premise Type
No. of Employees
Phone No.
Website Address
Years Established
Financial Year End
Turnover
Est Turnover
Profit Before Tax
Net Worth
Exports
For compatibility with our systems, it should be written using .NET in C#. IDEALLY it should be as an ASP page which we can run on our intranet server. However, as long as it is in C# .net we will be happy.
I am supplying:
1 example input files
1 Microsoft Access database
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Windows
IIS
Web browser (eg Internet Explorer)