This is a reasonably simple project so the maximum bid for this is 50 USD. More than that and I'll spend the time (now, time is something I lack right now) and get it done myself, later.
Functional specifications:
1. I need a simple PHP crawler to crawl websites and count the number of sites each crawled website links to.
2. The crawler only parses the first page (main page) of each website.
3. Obviously, to preserve system resources, I want the crawler not to crawl the same website twice, in the same month (but visit the website in consecutive months).
4. The crawler is only allowed to crawl $website_count (see below) websites a month.
5. Inputs in the [login to view URL] file ( this is where the crawler is configured from):
- base_url - the URL to start crawling from;
- website_count - integer;
- restrict_tlds - only count the links to com, org (comma separated list) or whatever TLDs;
- db_host - where the database is;
- db_database - what the database is;
- db_user - what the database user is;
- db_pass - what the user's password is;
- db_table - what table to store data into;
- user_agent - a configurable string to be set as user-agent, as some webmasters block PHP crawlers.
- sleep = amount of time (in seconds or miliseconds) between two consecutive crawls, so as not to increase server usage.
6. Storage fields (in db_table):
| ID | Website | LinkCount | Date |