Find Jobs
Hire Freelancers

Website Scraping/Spidering - Speed it up!

$250-750 USD

Completed
Posted almost 10 years ago

$250-750 USD

Paid on delivery
I am looking to get some listing information about each advert from a site that has tens of thousands of adverts. I have a traditional scraping program that loads each page, reads the data and stores it into a database- which works well. The problem is speed. It takes a long time to scrape the website. The delay is the time to access each page. The program runs through one advert (one page) at a time, taking around 2.6 seconds each. 50,000 adverts then means 36 hours! If I want the photos of each advert as well, then it takes many times longer. I want to speed the whole process up so that it runs in 10-30 minutes or less. My question is how do I accomplish this? Is it even possible? Some things I can do are: 1. run the program on a fast dedicated server 2. get a faster internet connection 3. split the task among a few different servers, each doing a different segment of the site to scrape Doing this will speed things up and I think I could probably bring the scraping time down to 5-6 hours if I use four servers. This would be expensive and I still would not get down to the 30 minute target. What I need is a different approach. Maybe re-write the program to run multiple parallel threads? Maybe there is spidering technology that search engines use that would be more efficient than scraping? Maybe there is something out there that I know nothing about that can address this issue? I am interested in hearing from anyone that knows how to do this, what suggestions you might have and if you have the skills to do it. If you have a solution that allows me to get the data from 50,000 adverts in 30 minutes then we can draw up a project. I will give you the source code of the existing program and all other details and you can then quote me on it. Looking forward to your responses.
Project ID: 6077738

About the project

18 proposals
Remote project
Active 10 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
Hello, I am an expert in scraping and I wrote over 2000 site scrapers in my career while i worked at [login to view URL] which is a used cars search engine. For such big sites we used many threads (which means executes something in parallel). I can extend this scraper of yours so it uses many threads and it can increase time of scraping significantly. Check my profile to see that people who worked with me are extremly satisfied with results and speed. I have 100% completition rate and I can start right away. Best regards, Dusan
$1,500 USD in 15 days
5.0 (9 reviews)
5.7
5.7
18 freelancers are bidding on average $665 USD for this job
User Avatar
Hello, The program has to be re-coded. From the project description, i understand that the program is using a browser, it is not the proper way to do web scraping. Also scraping 50k pages in 10-30 minutes might not be possible, it depends on your internet connection and the website stability.
$736 USD in 10 days
5.0 (78 reviews)
7.1
7.1
User Avatar
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi
$631 USD in 6 days
4.9 (91 reviews)
6.5
6.5
User Avatar
Hi I am sure i can provide you multi-threads desktop application that will do the scraping so fast. I can working on a complete demo- so you could at least test for 1000 adverts and see the time taken. Also i need no earlier fee also you can postponed the bid till you verify the demo. Waiting more details to start working Thanks
$888 USD in 3 days
5.0 (31 reviews)
6.3
6.3
User Avatar
Hi dear sir, I have 4 years exclusive experienced in PHP, HTML5, Javascript, CSS, HTML. Also I have a good team of 4 member. They all are expert in in your requirement fields. Our experience and dedication in work will certainly help you to accomplish your project in accurate and timely manner. Please refer to PM for details... Thanks. Best Regards Ferdous
$631 USD in 10 days
4.8 (68 reviews)
5.8
5.8
User Avatar
Hello, there are several ways to optimize your scraper. From my experience, I've written crawlers that crawl millions of pages on cheap dedicated ($200ish/mo) servers First of all please tell me in which language it's written and how you run it (and show me some of adverts). "Maybe re-write the program to run multiple parallel threads" - exactly. this can be done + with clever optimizations "Maybe there is spidering technology that search engines use that would be more efficient than scraping" - this is not possible as search engines reuqest all pages as well. "Maybe there is something out there that I know nothing about that can address this issue" - very probably 50k urls in under 30 minutes should not be a problem considering clean code and normal dedicated server (or KVM VPS with at least 2GB RAM) Waiting more details from you so we can negotiate cost & timeframe. Thanks
$599 USD in 10 days
5.0 (9 reviews)
5.6
5.6
User Avatar
Hi, I'm new as a freelancer here, but I'm a web scraper expert (please check my reviews: 5 stars with 100% completion rate).. I have computer science degree + specialization in web engineering and over 14 years' experience with web development (almost 10 with web scraping).. I have a few questions for you.. on which programming language the script you have were written? what kind of server you currently use to run it? could you tell me which is the site you're scraping? please contact me so we can discuss details.. Thanks for your time
$700 USD in 10 days
5.0 (19 reviews)
5.3
5.3
User Avatar
Hi, I have experience on multithreading spiders/crawlers, send me your script to take a look and I will tell you how fast we can go. Jose
$684 USD in 4 days
5.0 (4 reviews)
4.5
4.5
User Avatar
A proposal has not yet been provided
$555 USD in 10 days
5.0 (8 reviews)
3.6
3.6
User Avatar
Hi, May I know more details about what your program doing? To suggest any improvement, first I need to know how your existing doing it's job. The step by step of what it's doing (what you want to accomplish) If you give me the steps, I might come up with something. Regards,
$833 USD in 10 days
5.0 (2 reviews)
3.2
3.2
User Avatar
Hi there. Please consider my bid as a nominal one at a first stage. Prior to your acceptance, I'd like to take a preliminary look at your program, as to see whether I can spot a way to make it run more efficiently 'as is', that is, through making non-extensive changes, before resorting to re-engineering (which seems to be what the other bids so far are meant to). Once assessed and discussed a possible method, then we'd settle on an adjustment of my wage, if need be. So, as a first step, you could send me your program code and a single actual page url (i.e. one advert), as to analyze what and how data are being processed. Thank you, Grunty
$250 USD in 3 days
5.0 (1 review)
2.6
2.6
User Avatar
Hi. Certainly i can do it for you with that price. I did completed 100% projects for all my employers and they are very satisfied with my result. You can check it in my profile. Hope we have a deal. Thank you!
$466 USD in 10 days
0.0 (0 reviews)
2.2
2.2
User Avatar
A proposal has not yet been provided
$444 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Sr, Your current program is too slow because it is not taking full advantage of the processor and internet bandwidth because it is downloading one advert at a time. Computers have limited resources. So, all we can do is to create a program to take full advantage of computer resources: for this task (processor and internet bandwith). to maximize the speed of the program. To do so, I will use the same algorith that all file downloaders program are using: multhiple-threads downloads. This ways we might run for example 100 threads at a time to split the scraping work among all threads and this way we can scrape all your adverts with max speed. NOTE: You must understand that computer processor and internet bandwith are always limited. So if our new program can not finish the task in the time you need you can run the program in a dedicated server with better processor and internet bandwith to finish the task sooner. The new program I will rewrite will always take full advantage of the computer procesor and internet bandwidth, no matter if it is a slow computer or a fast computer. Proposal: To rewrite your new program in C# .NET to scrape all the 50 000 adverts and save it in you excel database taking full advantage of computer resources to max speed. I will take your source code only to copy some code I might use. I will use HTML Agility Pack technology to scrape the information from your website. Since this is the better technology to scrape information from web in .NET
$600 USD in 15 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I would like to work on your this project. I'm new in freelancer.com but not know in this sector. I have more than 4 years experience in this sector. So willing to response to discuss about the project. Thanks and god bless you. Sincerely, Saminatinny.
$611 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I'm senior developer and I'm very interested in your project. I am new freelancer, but done many projects on other site. Hope I will have a chance to work for you on this project. Please open a chat session to discuss the details. And I know the way to speed up.
$500 USD in 5 days
0.0 (0 reviews)
0.0
0.0
User Avatar
You don't need multiple servers to scrape 50 000 urls every 30 minutes -> ~28pages/sec. One dedicated server will be enough. You will need however to rewrite scrapping script to use parallel threads to download pages to be scrapped. As long as remote host can keep up with serving that amount of pages per second the task can be done. I would recommend using Java or Scala for the task as they are great at multithreading. I can create the app for You within days. If You have any question please message me. Regards, Matt
$1,160 USD in 10 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED KINGDOM
Bangkok, United Kingdom
5.0
15
Payment method verified
Member since Mar 5, 2005

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.