Extract data from 2 specific categories in www.superpages.com website and place into CSV format files.
$100-300 USD
Cancelled
Posted about 18 years ago
$100-300 USD
Paid on delivery
1. I don't need the program or the source code. I only need the data so you can get it any way you need to (legally). I have seen others on this site supply program and source code to do this for $200. I have also researched software that does this for $350-$450 and can do it an unlimited amount of times. I am looking in the $100 range for assistance to get the data only. Maybe (hopefully) you already have grabber software to do this and don't need to spend allot of time. For example: [login to view URL] or [login to view URL]
2. Here are the 2 specific URL's I need the completed contact list from:
a) [login to view URL];R=N&STYPE=S&C=&CID=00000480014&cbdt=Jewelers&catID=2947&L=&PS=15&RT=&RS=&RR=&OO=&search=Find+It
b) [login to view URL];R=N&STYPE=S&C=&CID=00000493919&cbdt=Gift+Shops&catID=15295&L=&PS=15&RT=&RS=&RR=&OO=&search=Find+It
3. Item 2a has 53474 records and item 2b has 89589 records. Please note that I cannot accept data any other yellowpages service except [login to view URL]
4. Your spider or grabber program must parse the HTML and extract the business name, city, state, zip code, telephone number, email address (if applicable), and website (if applicable) into a CSV formatted text file. Also, a field needs to be added before “business name” called "category". For item 2a, this field must contain "jewelers-retail" for all records. For item 2b, this field must contain "gift shops" for all records.
5. A general clean up of the data must be done so the fields are as clean as possible. Most important are the phone numbers that absolutely must be in the format 999-999-9999 and be totally clean (no extra characters like semi-colons or extra digits etc.). Before submitting the files to me, I need the combined file (2a and 2b) merged and purged of any records with duplicate phone numbers. If duplicate telephone numbers are found, records with the least information must be the ones that are deleted. For example, 2 records with the same telephone numbers but one lists a fax and the other doesn't, then delete the one without the fax number. Addresses are less important than phone numbers and email addresses. Even if there are more than 2 business names for the same phone number, pick one randomly; just make sure one record is left with the phone number. Finally, I need the data sorted by category first, then state second.
[login to view URL] data is to be submitted to me in files with approximately 10000 records each. I would be expecting a total of 15 files (14 @ 10000 and 1 partial).
My Requirements:
1. You must be easily contacted. Either by phone, or you will be required to answer any e-mail I send to you within 12 hours time.
2. Must speak and write English well.
3. You can keep any code or program. I only need the data in a format that can be imported into Excel and other software that reads CSV files.
4. I would like this done and emailed to me (I am willing to try to FTP download) no later than March 4th. I am on dial-up in a very rural area.
Project ID: 46916
About the project
6 proposals
Remote project
Active 18 yrs ago
Looking to make some money?
Benefits of bidding on Freelancer
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
6 freelancers are bidding on average $175 USD for this job