8 simple scrapers needed

$30-250 USD

Completed

Posted

over 14 years ago

$30-250 USD

Paid on delivery

MUST follow the coding instructions laid out below (no deviations or substitutions). I have attached sample data and details for the 8 sites to scrape. The scraper definition is also attached so you can see proper formatting for JSON. note: I will have many more of these for developers that perform a good job in a timely and cost-effective manner. Thanks, Scott Scraping Specs - Written in Ruby, NO TABS (2 spaces instead). - Run from the command line taking two arguments - the first should be an integer for the scrape ID, the second should be the URL for the VENUE where the scrape starts: ./[login to view URL] <ID:integer> <URL:string> ./[login to view URL] 111 [login to view URL] - Must use Curl for GET-ing URLs GEM: curb - Must only use standard Ruby regex for parsing, OR hpricot OR nokogiri as an alternative GEM: hpricot GEM: nokogiri - Must output JSON as a finished product, sample data included below GEM: json - Must *NOT* use any other GEMS outside of these three: curb, hpricot, nokogiri, json - The script should return only 1 of 2 things formatted in JSON. Either an ERROR, or the actual data if everything works. - If there is any kind of error, it needs to output json as defined with a specific error code and message, or at least the standard error code and message: {"scrape": { "id": <SCRAPE_ID_FROM_INITIAL_ARGUMENT_1>, "url": "<URL_FROM_INITIAL_ARGUMENT_2>", "success": <BOOLEAN: true/false>, "error": { "code": <VALID_ERROR_CODE>, "description": "<TEXT_WITH_WHATEVER_ERROR_MESSAGE_YOU_WANT>" } } VALID ERROR CODES ARE: 10: (Generic error of any kind) 20: (URL GET error - any error involving GET-ing a URL) 30: (PARSE error - any error involving parsing the data) SAMPLE ERROR RETURN: {"scrape": { "id": 111, "url": "http://foo.com/calendar", "success": false, "error": { "code": 10, "description": "Problem doing something in the foo function." } } - If it succeeds, it needs to output json as defined with at least the REQUIRED following data in proper format: {"scrape": { "id": <SCRAPE_ID_FROM_INITIAL_ARGUMENT_1>, "url": "<URL_FROM_INITIAL_ARGUMENT_2>", "success": <BOOLEAN: true/false>, "events": [ { "title": "<STRING: Name of the event REQUIRED>", "start_date": "<DATE: date of the event, or date the event starts (MM/DD/YYYY) REQUIRED>", "start_time": "<DATETIME: date/time the event starts in *24 HOUR LOCAL TIME* (MM/DD/YYYY HH:MM) OPTIONAL>", "end_date": "<DATE: date the event ends (MM/DD/YYYY) OPTIONAL>", "end_time": "<DATETIME: date/time the event ends in *24 HOUR LOCAL TIME* (MM/DD/YYYY HH:MM) OPTIONAL>", "repeating": <INTEGER: 0 if the event happens once, 1 if the event repeats weekly REQUIRED>, "repeats_on": "<STRING: *full* name of the day of week the event repeats on (Thursday, Friday, etc.) OPTIONAL>", "repeats_until": "<DATE: date the event repeats until (MM/DD/YYYY) OPTIONAL>", "image_url": "<STRING: url for an image associated with this event OPTIONAL>", "ticket_url": "<STRING: url to buy tickets for this event OPTIONAL>", "ticket_prices": "<STRING: descriptional text about the ticket price OPTIONAL>", "description": "<STRING: any freeform descriptive text about the event OPTIONAL>", "bands": [ { "name": "<STRING: band name>" }, { "name": "<STRING: band name>" } ] } ] } SAMPLE DATA: {"scrape": { "id": 111, "url": "http://foo.com/calendar", "success": true, "events": [ { "title": "2$ off Lone Star!", "start_date": "01/01/2010", "repeating": 1, "repeats_on": "Tuesday", "repeats_until": "01/01/2011", "image_url": "http://pictures.com/of/lone_star.jpg", }, { "title": "Rock Your Mom's House", "start_date": "01/10/2010", "start_time": "01/10/2010 19:00", "end_time": "01/10/2010 22:00", "repeating": 0, "image_url": "http://yourmoms.com/house.gif", "ticket_url": "http://buytix.to/yourmoms", "ticket_prices": "$8.00 all ages", "description": "These people really know how to stick it to you.", "bands": [ { "name": "Buttcheeck Falcons" }, { "name": "Foo Fighters" } ] } ] } NOTES: - All TIMES / DATETIMES should be in the LOCAL TIME of whatever VENUE is being scraped. Usually this will just be the time that you're scraping, but BE SURE. - ALWAYS return a valid error code if anything goes wrong. Even if it's just the generic error message.

8 simple scrapers needed

$30-250 USD

$30-250 USD

About the project

Looking to make some money?

Benefits of bidding on Freelancer

About the client

Client Verification

Other jobs from this client

Similar jobs