I need spiders created for 4 websites. They should be separate applications, each written in C#. Will need for you to compile the apps since I don't have Visual Studio, but will require finished source code when done.
The 4 sites to spider are:
1. Press Releases on [login to view URL]
2. [login to view URL]
3. [login to view URL]
4. [login to view URL]
Detailed specs:
press release spider - of [login to view URL]
Use Proxies (ability to provide a list of private proxies)
Need two modes: mode 1 - spider all available pages of press releases until stopped; mode 2 - only spider new releases added, and then stop (this would be for running as a scheduled task)
spider every page
capture source (PR Newswire, BusinessWire, etc)
capture publication time
capture email
capture phone via Regular Expression
capture title
capture article URL
[login to view URL]:
use proxies
1. Capture font name, URL of the font, font author (separate table to track unique font authors with their names and author/foundry URLs), date font was added, font price, discount. From: [login to view URL]
2. Capture for each font author: Foundry Name, Phone number, web URL
3. Spider BestSellers ([login to view URL]) - capture font/collection name, author ID (url/name - reference to authors table), date spidered
[login to view URL]:
use proxies
1. Spider all the fonts on the site by cycling through each letter and number linked to from the homepage. For each letter/number, cycle through every page of results to capture each font.
2. For each font, capture: font name, font URL, unique author ID (create record for each author that includes name, number of downloads total, # of downloads today, license terms (eg "free for personal use"), author URL on [login to view URL], author's website address)
[login to view URL]:
use proxies
1. Spider each of the categories (click "Font Categories" in nav bar to reveal)
2. For each category, cycle through every page of the results
3. For each font, capture: font name, font URL, license type (free for commercial use, free for personal use, etc), the author's paypal email address if a Donate link is present, MyFonts font url (if present), [login to view URL] Url (if present), author ID (create unique author ID from the author's URL on [login to view URL] and the author username).
4. Spider for each author: author's name (if provided on their user page - as Raymond Larabie does here [login to view URL]); author's website URL, author's twitter handle
In my reviews & portfolio, please see scrapers supporting proxies, scheduler, ajax sites, robust error recovery, rich reporting and more. May I know what kind of output or storage the scrapers will use? I can suggest a local db (MS Access, SQLite, SQL CE), a server db (MS SQL, MySQL) or a file (e.g. XML, JSON).
$250 USD in 3 days
5.0 (8 reviews)
4.3
4.3
7 freelancers are bidding on average $396 USD for this job
Hi,
I am a professional web data scraper specialized using Python program, PHP script, .Net program, Crawler and Bot. My tool can search data and get information from Aa to
Zz with an existing lists of english words. Below is the link for your reference as a sample related to my tool being developed. This demo will capture doctor's name,
address, zip, phone, ratings and reviews in 4 different sites. The final output will be save in *.XLSX format or as your quirement.I can start as early possible depending on your approval and acceptance. In relation to this application, I can rest assured I will
impart a high quality and reliable, efficient and accurate with the output. Give me a try and I will try to get the best results and finish the project far before
the deadline. Thanks,Ferdous
Hello,
I just went through your project and decided to place my bid for it.
I see that you're looking for a crawler script which can be used to gather information from various websites you mentioned.
I've been developing web related applications for the past 3 years now and my approach to create such a crawler would be to accomplish it by using php and javascript.
Although you mentioned that the project needs to be done in C#, I think more easier approach to complete it would be by using php. Ultimately, the decision is yours.
Let me know your thoughts on this.
I will be waiting for a reply.
Regards,
Morfys.
I can write crawlers/spiders for your given 4 websites.
I'll fetch all information that you mentioned.
Can you please provide me list of proxies
I'll develop this C#