Closed

Parsing 20,000 HTML files in Chinese

We need to parse and clean data from about 20,000 html files in Chinese (see attached examples). The final goal is to remove all information from the file that does not belong to the relevant firm (highlighted in blue). The main challenge is to deal with the different structures of the files (see examples). Some files contain only one firm (and therefore would not need to be edited), while others contain a list of a large number of firms.

We can only hire a firm that would be able to provide us with a valid firm registration certificate.

Skills: Data Entry, Data Scraping, Web Scraping, ETL

See more: how to read a csv file in javascript with example, papaparse transform example, papa parse example, papaparse download csv, javascript read local csv file, papa parse npm, papa parse, papa parse cdn, translate german english html files, create html files text files, extract data html files php, convert html files joomla, convert photoshop files html files, joomla upload static html files, netbeans java display html files, convert html files xml files without losing data, file dump html files, slice psd files html files, extract information html files, convert html files drupal theme

About the Employer:
( 4 reviews ) London, United Kingdom

Project ID: #24568822

28 freelancers are bidding on average £162 for this job

sohandas

and where you want to store the parsed data? csv excel or database? i will make a html parse in python to parse the data and save to where ever you want...message me thanks!

£200 GBP in 2 days
(333 Reviews)
7.6
schoudhary1553

Hello, I can help you with your project - Parsing 20,000 HTML files in Chinese I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already comp More

£220 GBP in 5 days
(233 Reviews)
7.7
pandey2008

Hello sir we have 8 members team and we can start work right now just give us one chance we will do our best awaiting your reply,thx

£222 GBP in 4 days
(569 Reviews)
7.6
etuannv

Hi there, I am interested in your project. I would approach your project by making a program to parse and clean the data automatically. The program will be written in Python. It could be run on any operating system and More

£225 GBP in 5 days
(90 Reviews)
6.7
ineedWorkJob

Greetings sir, I have read your project details and i will process all your 20k HTML files to remove any firm that does not have valid registration certificate. Sir, i can deliver this project sooner than you exp More

£100 GBP in 1 day
(47 Reviews)
5.5
shantanupython

number does not matter here cause we will have the files locally what matters is parsing it properly using correct encoding will you be sending the 10,000 files on my end for me to parse the data or want me to send yo More

£80 GBP in 2 days
(43 Reviews)
5.2
iautomationus

No problem. I've read and understand that these documents need to be parsed as described. Relevant experience: Linux, Debian, Centos, Ubuntu, Server Admin, Apache, PHP5, PHP7, Python, API Integration --- - Scripts, so More

£60 GBP in 1 day
(27 Reviews)
4.6
kirilnberezebko

Hi. I have read your job description carefully. so your job is very interested for me. I can start just now and I can do your project perfectly. If you give me a chance, you will get good result. I hope to discuss more More

£200 GBP in 2 days
(6 Reviews)
4.3
abdulkarim97

Hello there I am Kader sardar and I am a student of computer science and engineering and I can do it because I have good experience in this field and I have been working here for 3 years. So please send me a message fo More

£135 GBP in 7 days
(21 Reviews)
4.5
sramsiks

Hello. i read you description but you attach txt files as example. there is no highlighting unfortunately. can you show target site where this pages placed and explain what exactly data needed& i think i can do this More

£50 GBP in 7 days
(6 Reviews)
3.9
science64

Hello, I am python developper and I can write python script to parse valuable data from the 20 000 html files. This quite impossible to finish by hand I believe. In your text files, I could not detect blue highlights. More

£240 GBP in 7 days
(5 Reviews)
2.9
kuanhuichia

A bit of my background, i am a native Chinese who was graduated from University of Sheffield, UK. I used to work as a part time translator and document translation from English to Chinese was one of my strength. I am More

£80 GBP in 7 days
(5 Reviews)
1.8
benliao

Dear Sir/Madam I can provide you perfect work as you looking for. I have rich experience about python, PHP,C# and I have major about Web scraping,Crawling,MYSQL, Django etc. we can discuss more in chat . I will be wai More

£100 GBP in 3 days
(1 Review)
2.0
nisarahmed57786

Hello I read your project very carefully... i hope you give a chance and i prove my self..... Regard Nisar

£20 GBP in 1 day
(3 Reviews)
1.0
sahilbisla

Dear sir. I have checked your project description and requirements carefully. I will never make you disappointed. If you give me a chance to work with you, I will provide you with high-quality work only for you. I am a More

£100 GBP in 4 days
(1 Review)
1.0
ahmedbarbary34

Hi Dear, i have read your project details i will be able to assist you with high and professional work. regards, ahmed

£120 GBP in 7 days
(1 Review)
0.6
mycaree

hi.i am a native Chinese speaker from Henan province,one of the middle province of mainland [login to view URL] helps me a lot after i graduated from university,it has been many years experience since i became a translator,i More

£55 GBP in 3 days
(0 Reviews)
0.0
kristianuss

Hello I can provide us with a valid firm registration certificate. 20k can take up to 10 days to have some good results. Feel free to contact me

£800 GBP in 7 days
(0 Reviews)
0.0
sea2sea

- Have a strong organizational skill and pay highly attention to details - Ways of thinking: flexible, adaptable and creative - Hard working: I am dedicated all my skill to each project, able to complete timely and acc More

£150 GBP in 10 days
(0 Reviews)
0.0
walaasiliman5

Hey,,i am studying at the faculty of Alsun Ainshams university Chinese language department which is located in Cairo,i am in the last year of college,i am studying chinese language for more than 4 years.2019 i went to More

£130 GBP in 7 days
(0 Reviews)
0.0