IMPORTANT 1: you will get specific Google URLs to scrape when you accept the job
Step 1: I input in a form input box "keyword"
Step 2: (use random [login to view URL]) scrape sourcecode urls from google results, they are formatted like this (class="l" href="[login to view URL]") so you wanna use regex and i will provide special urls & [login to view URL] list, we will not access [login to view URL] directly, you will query a few URLs with changing parameters
Step 3: Stop when google returns “did not match any documents“
Step 4: scrape the urls from the urls' sourcecode (use random [login to view URL])
Step 5: check for Timeout, Host not Found, 301, 302, 404 of URLs on page - if present, write in this format on the screen/guy or in a [login to view URL]:
“url from google”,”error code”,”dead url or redirect”
“[login to view URL]”,”404”,”[login to view URL]”
“[login to view URL]”,”301”,”[login to view URL]”
“[login to view URL]”,”Timeout”,”[login to view URL]”
make everything editable in the [login to view URL] file, for example:
timeout = 60;
pages = 3;
crawlPerDomain: 10;
formatCSV: "googleDomain","responseCode","brokenDestinationURL";
etc.
...so that i can optimize speed and amount and user agent list, etc.
if possible, show progress in a window (so i know what the tool is doing currently)
(excuse my bad explanation, i am not a developer)
Hi! I'm new here on freelancer.com but I've done many projects on php/mysql and I can do this job for you in as low price as possible. you can check my work on :
[login to view URL]
Looking forward to work with you.
Regards:
Ali Mohyudin