Find Jobs
Hire Freelancers

Scraper/webscrawler written in python

$250-750 USD

Cancelled
Posted over 8 years ago

$250-750 USD

Paid on delivery
Hi, I got a python webcrawler/scraper written using phantomjs and such but has to be modified to better solution maybe "requests". I would need someone who can optimize this with a short notice, preferably within some hours. And that can operate the script on servers provided by me, or if you would like to set it up yourself I can compensate for that depending on the price for it. Today the scraper can retrieve about 2500-3000 items an hour, and there is about 300 000 items. I would need this in three days, maybe four. The script I got might have som bugs as it is not the latest version. So if there is some improvements that can be done, and of course running multiple instances on the same/different server. What ever works for me. But I would like to be operating the server, and making sure everything is running. Please note that it cannot be written in PHP. Hope someone will be interested.
Project ID: 8540715

About the project

26 proposals
Remote project
Active 9 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
26 freelancers are bidding on average $485 USD for this job
User Avatar
Hi, I can help you fixing your script and I can also use my resources to get your data within 1 day, not 3-4 :) let me know what site you scraping so I'll estimate how much that will cost. Thanks
$526 USD in 10 days
5.0 (272 reviews)
8.7
8.7
User Avatar
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi
$472 USD in 6 days
5.0 (372 reviews)
7.8
7.8
User Avatar
Hello I can try to help you. But I need more details. I have an experinence of creating python scripts to extract data from various websites. Waitng for your response.
$277 USD in 3 days
5.0 (229 reviews)
7.1
7.1
User Avatar
Hi, I am a professional web data scraper specialized using Python program, PHP script, .Net program, Crawler and Bot. My tool can search data and get information from Aa to Zz with an existing lists of english words. Below is the link for your reference as a sample related to my tool being developed. This demo will capture doctor's name, address, zip, phone, ratings and reviews in 4 different sites. The final output will be save in *.XLSX format or as your quirement.I can start as early possible depending on your approval and acceptance. In relation to this application, I can rest assured I will impart a high quality and reliable, efficient and accurate with the output. Give me a try and I will try to get the best results and finish the project far before the deadline. Thanks,Ferdous
$526 USD in 10 days
4.8 (164 reviews)
7.0
7.0
User Avatar
Hello! We are a small team of Python/JS developers. We have experience in developing custom webscrapers and other standalone scripts. We have some question about your project. What kind of pages do you scrape? 300000 per hour is rather high rate. What environment do you use? Is it ok for you if script will run multiple threads. How do you launch the script and pass pages to it? Do you need to connect it with any kind of UI? We will be happy to help you! Thanks and Regards!
$700 USD in 5 days
5.0 (45 reviews)
6.8
6.8
User Avatar
A proposal has not yet been provided
$631 USD in 4 days
5.0 (28 reviews)
6.3
6.3
User Avatar
Hello. I am a professional Python programmer with several years of web scraping experience. I can optimize the webcrawler you already have, although I would prefer to build my own from scratch if that would be okay. Please contact me so that we may speak further, and I hope to work with you soon.
$444 USD in 10 days
4.9 (68 reviews)
6.5
6.5
User Avatar
Hello! I'm web scraping expert. I use python scrapy framework. My scripts can run on windows or linux, but linux is preferably. I can schedule scripts on server if it is required. I have a lot of finish projects (google scraping, facebook scraping, yellow pages, linkedinIn, amazon, webshops and other sites with lists of any items). I can scrape secured and protected sites, my crawlers can enter into login form, emulate ajax requests etc. If site block IP i can use proxy or TOR. I can try avoid captha on site in avtomatic or manual mode. I can export data into json, csv (excel), mysql, mongodb.
$400 USD in 3 days
4.8 (111 reviews)
6.6
6.6
User Avatar
Hi, Could you make the source code available? I am not sure if I am understanding the requirements correctly: do you want to have the data scraped within 3-4 days? Does that include the project completion time? Could you provide a bit more detail in that aspect? Looking forward to your reply, Artur
$555 USD in 10 days
4.9 (70 reviews)
6.2
6.2
User Avatar
We have a good amount of experience in webscraping using Python,Django and nodejs. This is our latest project on webscraping using python: Scraping using Python: Electronics Parts Intelligence Processing eProductScrapper is mostly scraping & data-mining oriented project, which is based on scrapy and lxml plugins, along with Celery distributed environment via redis. This is mostly focused on electronics parts to fetch information like product details, sku, technical datasheet(pdf), product stock, price history. which will be used to make product life-cycle in a highly presentable manner to make non-authorized seller, brokers, after market sellers more aware of the market requirements of the products. Technology & Framework Used: Python, django, celery, scrapy, nodejs, mongodb, mysql. We would love to have ongoing relationships with your team and ready to work on your time schedule 40-50 hrs per week as per requirements. Thanks
$1,000 USD in 10 days
4.9 (19 reviews)
6.3
6.3
User Avatar
hello : python will be used as you like . i have done crawler with python + scrapy + selunium etc .. maybe you can let take a look at the script to see if i could do it. btw , my skill is python2 ty
$250 USD in 3 days
5.0 (29 reviews)
5.6
5.6
User Avatar
Python developer. Have experience in parsing sites using various technologies : BeautifulSoup,Selenium etc. Please provide me the url of the site so I can look if I can make your script running faster.
$388 USD in 3 days
5.0 (13 reviews)
4.8
4.8
User Avatar
Hello, i can change the script to use multiprocess if it should run on the same server and making several processs using all cores of the cpu or if using different server make it using celery to spread the task to multiple server and also using all cores. Best regards, Thorsten Sanders
$736 USD in 3 days
5.0 (6 reviews)
4.5
4.5
User Avatar
I got 7+years work experience in Data Collection,Bulk Email Campaign,Excel VBA and Internet Research in IT companies here.I can do write and modify script and scrap datas from site using C++,Python and Perl coding as per your requirements in excel with multiple ip rotations.I have dealt with US,UK and Australia companies President,Directors and Managers for web design and development projects successfully and I have Good Communication with writing skills.I am well versed in Internet,MS Office Applications and Phone Etiquette manners with latest Technologies.I can accept your payment terms.
$388 USD in 3 days
4.0 (7 reviews)
4.3
4.3
User Avatar
I am developing/modifying web scrapers for quite some time. I am also familiar with requests library if that's what you think of in your project desc. In order to be ore precise with the deadline I need more details (and a specification of tasks too) on the project. Cheers
$444 USD in 3 days
5.0 (3 reviews)
3.1
3.1
User Avatar
Hi, expert programmer and web/data scraper here with over 19 years experience in programming and RDBMS. Please see my reviews. I'm using Python or Perl for this kind of jobs. My offer is 20 usd (without fees) per 1 000 records I'm able to extract data fast.
$555 USD in 3 days
5.0 (4 reviews)
3.2
3.2
User Avatar
Dear Sirs, I'm a Python web scraping specialist. I know and can manage the selenium PhantomJs framework. But I also use other environments such as scrapy, urllib, beautifulsoup, requests. Etc. I believe I can help you to improve the extraction rate and propose alternative methods. Just give me an opportunity. Thank you Alejandro.
$250 USD in 10 days
5.0 (3 reviews)
1.6
1.6
User Avatar
Hi, I have done exact same optimization for another client on Upwork already. The job was to scrape all books information for trade in from Amazon (it was around 500,000) and the script based on phantomjs worked too slow - the average scraping speed was around 1 book per second. When I redesigned it to use "urllib" module (similar to "requests" module), the scraping speed increased to 8 books per second, so all the 500,000 books were scraped in 19 hours. Both designs included multiprocessing, however urllib turned out to be much faster and robust than phantomjs. Please, contact me if you have any additional questions. Regards, Maksym
$333 USD in 3 days
5.0 (3 reviews)
1.8
1.8
User Avatar
Hi, I have masters degree in parallel programming and experience with Python. I think I can help you. I would create a parallel code that creates a streaming of fetched pages. It could scale with your connection. I would also create a pool of consumers that scrap the content and save wherever you need. I would use Redis to centralize everything. As you have 300k items, the data probably fits in memory. If no, we can use a persistent solution like mongo, or even the filesystem. I don't know if you have a list of urls or if it's built from the previous scrapped pages. If so, redis could also help to create a list of pages to be consumed, and the downloaders would act as consumers of this queue. We can use requests to download and lxml to parse the html. I think this architecture would optimize the two bottlenecks, the page fetch and the html scrapping.
$444 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I can write it to requests & xpath, if run the webpage javascript is unnecessary , using requests maybe a better way, and using multiprocessing to run multi instance on a machine or several servers, if run on several servers , using Redis as message queue maybe a good way.
$400 USD in 10 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of SWEDEN
Stockholm, Sweden
5.0
36
Member since Aug 11, 2013

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.