Web scraper in Scrapy

$30-250 USD

In Progress

Posted

over 11 years ago

$30-250 USD

Paid on delivery

I require a web scraper (written in Python using Scrapy) that will have multiple spiders (to scrape multiple web sites) for scraping news websites and retrieving articles, filtering these articles using keyword matching (with Scrapy pipelines) and storing relevant articles in a postgreSQL database. Spiders - The specific web sites that I would like scraped will be provided at project commencement. - The spiders should scrape the news website's RSS feeds (where possible). - The spiders should store the following information for each article: * title * author * date * publication name * article URL * article text (including all HTML formatting) * keywords (either from the article itself or from HTML meta tags) - The spiders should be as generic as possible, extending some base spider class to allow for further extension Pipelines - A pipeline should filter the article by matching the article's keywords or article text with a list of "interesting" keywords - A second pipeline should write all "interesting" articles to a postgreSQL database.

Web scraper in Scrapy

$30-250 USD

$30-250 USD

About the project

Looking to make some money?

Benefits of bidding on Freelancer

About the client

Client Verification

Other jobs from this client

Similar jobs