Find Jobs
Hire Freelancers

Bulk Scanned PDFs doing selective area OCR Application Dev for vyadzmak

$250-750 USD

In Progress
Posted over 11 years ago

$250-750 USD

Paid on delivery
Project Description: I have thousands of scanned form pdfs (by form I don't mean they are editable or fillable pdfs, they are just strict rasterized tif based graphics). The forms are of different types. The scan quality of some of the pdfs are medium at best. I need someone to develop a desktop win32/64 based software that does ocr of some specific area on the form and save captured data to database. With regards to application (call it templateApp from now on) picking out the desired area on form to be ocr, I envision it to be a Win32/64 desktop application, where administrator (generally selective few users only) whom has rights to setup this capture and ocr specifics, will be opening a pdf, from there, he/she can mark multiple "area of interests" (like how we select an area to crop out in mspaint, dragging from left upper corner to right lower corner), and such info will be stored in db somewhere to be used by the actual ocr application (call it ocrProcessingApp from now on). Subsequently using ocProcessingApp, all incoming pdf files (of the same form format and in bulk hundreds or thousands) intended to be ocr and captured text from these multiple "area of interests" will be processed accordingly, and all text found can be stored in mysql database. Ocr requirement is going to english text only in this project, please keep in mind if you can do additional language ocr, we can extend project to different phase handling multiple language. I don't have any ocr library of choice, please PM me what ocr library you intend to use to be seriously considered as a candidate. It is very important to obtain high accuracy in ocr text while the incoming pdf/images are clear. If you had done any customization like pre-processing images/pdf e.g. scaling, de-noising, etc what makes images clearer before feeding into OCR engines like Tesseract, gocr, etc, it would be a bonus. If you had experience with grid table parsing based on Tesseract/Cuneiform, it would be a bonus also. Some of the forms have a grid between the values, others don't. Please PM me how you plan to tackle this issue to make out the text we need to capture. Please keep in mind I intend to create a general purpose tool i.e. not specifically geared towards a specific job only. You can assume all scanned pdfs are straight and not skewed. You can use any programming language, but if it's not a .NET language, java, C++ or python, please check with me before. Also please include what language you will be using as part of your quote. Application must work on XP, Vista, Windows7 both x32 and x64 OS. Please PM me which programming language you intend to develop these two applications (templateApp & ocrProcessingApp) under. Please make sure application is bug free. Please try not to use any 3rd party components where I have to pay for licenses. But if you must, please include 3rd party component info and cost in your quote or PM me. Ultimately cost, reliability, and licenses for royalty distribution is major factors. I need all source code and rights to the source and binary code in the end. Thank you for your interest in bidding on this project. Possible follow-on projects based on satisfactory work on this project. If you have any questions, please don't hesitate to ask. Thanks.so. Skills required: .NET, C# Programming, Java, OCR, Visual Basic Per our discussion previously via private messaging... Thanks.
Project ID: 3993579

About the project

3 proposals
Remote project
Active 11 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

About the client

Flag of CANADA
Scarborough, Canada
4.9
47
Member since Jan 7, 2009

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.