I will build a Java application to scrape data from websites.
Here is the complete description of the application.
The application will:
1. Crawl website and extract all the product URL's.
2. Save all these URL's into the database.
3. Get each URL from the database(with a 10 second interval), navigate to the URL and extract all necessary data. Ex: Category, sub category, description, product name, images, size, color, price etc
4. Save all the data extracted from website into a database.
5. Save the product brochures on the file system.
5. Dump the entire data into a spreadsheet (csv format)(optional).
And also Script file with website as an argument to scrape data of that particular website.
Note: It is good to use proxy for scraping these sites.
To automate the process, we deploy this application on a remote server and set up cron jobs to trigger the application to scrape a particular website.
I will design the database schema to save the scraped data in a structured manner.
I built a price comparison tool in the past to compare products across 21 e-commerce websites. I scraped all the websites data and compared prices of the products.
Contact me to discuss more about my proposal.