Create a data/web scraping script that will extract records from 2 websites and store the data in well structured XML format. This script will be scheduled to run nightly as a PHP CRON job.
## Deliverables
Create a PHP script that scrapes data from this website:
[login to view URL]
Note that there are actually two websites that need to be scraped - some job postings are linked to a second site.
All jobs from both sites will need to be scraped into an XML file that is consistently structured so it can be imported into a MySQL database later.
This website contains about 40,000 jobs - they all needs to be scraped. In order to do this, the script will need to load and scrape each province/region individually since the website will limit results returned to 1000 at a time.
Jobs posted in the province of Quebec will show up in the search, but the details are on a secondary website. The script needs to follow the link to the 2nd site and scrape details from there.
* * *This broadcast message was sent to all bidders on Wednesday Jan 13, 2010 9:57:45 PM:
I've uploaded an XML sample file to show what the scraping output should look like. Please review.