Find Jobs
Hire Freelancers

Document HTML and PDF parsing data extraction

$30-250 USD

Completed
Posted over 9 years ago

$30-250 USD

Paid on delivery
Small data set from two file formats on HTML the other PDF. The files must be found using a form POST to find a list of their URLS. Script to be written in Perl. Use curl or LWP agent.
Project ID: 7165842

About the project

5 proposals
Remote project
Active 9 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
Thank you for the invitation. I can create such script for you in Perl but for the parsing PDF part I recommend to use third party software called xpdf (it's free).My program will execute pdftotxt program (from xpdf), get txt and parse it. Thanks. Roman
$155 USD in 2 days
5.0 (350 reviews)
7.3
7.3
User Avatar
Dear Sir, Thank you for your very interesting project. I am a programmer with database and system administration experience. I am conversant in Perl, Python, PHP and C. I am specialized on back-end projects and some of my recent projects includes: - Back-end for email signing and certification system. Processed emails, used Amazon S3, SES and CloudSearch. Provided REST API via Mojolicious. - Create the back-end, including JSON based API for 3D iOS printing application - API implementation for PredatorBarrier: mixture of XML and HTML parsing to gather data. Combine 100k school records with 1M predator records. - Medianet XML feeds processing and importing into MySQL. Largest feed was 125GB with 400M records. - Web service to load and parse CSV,XLX and XML files. Parsed based on external dictionary with change detection and MySQL uploading - Scrape [login to view URL], [login to view URL] and [login to view URL] matched info from all sites and stored in MySQL - Real estate site scrapper fix: needed to fix a large scrapper built into an real estate aggregate site - Scrapping from Forbes/Fortune2000, Crunchbase, Angellist and many others - Authentication and accounting module for a high volume SMTP server (+100k emails per hour) Of course I am available via Skype, including audio and video chat, to answer any questions. Meanwhile could you provide some more information? Looking forward to work together, Felix Enescu
$155 USD in 3 days
5.0 (15 reviews)
4.8
4.8
5 freelancers are bidding on average $176 USD for this job
User Avatar
Hopefully the HTML doesn't come in thousands of variants for the time before 2014-04-03... But I'd be glad to help. Thank you.
$222 USD in 5 days
4.9 (27 reviews)
5.2
5.2
User Avatar
Hello, Greetings from Shweta. I can write a Perl script to parse the html and the pdf files described in the project. I have done something similar of getting docs from UN website. Please get back for further discussions. Thanks, Shweta
$250 USD in 3 days
4.9 (23 reviews)
4.7
4.7
User Avatar
Hello. More 20 years programming experience. Regards. ---------------------------------------------------------------------------------------------------------------------------------------------------
$100 USD in 5 days
4.4 (25 reviews)
5.0
5.0

About the client

Flag of UNITED STATES
Santa Cruz, United States
5.0
29
Payment method verified
Member since Sep 17, 2004

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.