I need a web scraper written for the following url:
[login to view URL]
All pages will need to be retrieved not just page one. The data on this site changes and page 2 does not always exist, however we need to scrape additional pages if they exist.
The number of rows will vary, the rows will be separated by line segments.
The output should be a pipe (|) delimited file with the following column mappings:
origin_city --> data located in the "Pickup" column before the (,) but after the (-), if there is no (-) then it is the data before the (,)
origin_state --> data located in the "Pickup" column after the (,)
ship_date --> the data from the "Pickup On" column changed to the YYYY-MM-DD format, if it says "Ready" change to the current days date in the YYYY-MM-DD format
destination_city --> data located in the "DESTINATION" column before the (,)
destination_state --> data located in the "DESTINATION" column after the (,)
receive_date --> leave blank
trailer_type --> data located in the "Truck" column
load_size --> put the word "Full"
weight --> leave blank
length --> leave blank
width --> leave blank
height --> leave blank
trip_miles --> leave blank
pay_rate --> leave blank
contact_phone --> leave blank
contact_name --> leave blank
tarp_required --> leave blank
comment --> data located in the "Pickup" column, below the origin_city and origin_state, data starts with the word "NOTES" all data including and after the word "NOTES" will need to be added
if data below the origin_city and origin_state has the comment "HOLD UP UNTIL FURTHER NOT", don't include that data
load_number --> leave blank
The first line of the output should contain all of the column headers.
Any field that contain no data should be left blank.
Please do not use words like "null" or "blank" in blank columns.
Below is a sample output of the first 5 columns using sample data:
The deliverable will be a Perl .pl file that must run on
Ubuntu Linux and must use Modern::Perl. The Perl .pl file
should be called '[login to view URL]' and the output file should be
called '[login to view URL]'
It will be scheduled in cron to run unattended every 15 minutes.
Please specify what language/OS/modules you plan to use.
Also, please include the word "raccoon" in your bid so I know that
you read this description.