We need to parse and clean data from about 20,000 html files in Chinese (see attached examples). The final goal is to remove all information from the file that does not belong to the relevant firm (highlighted in blue). The main challenge is to deal with the different structures of the files (see examples). Some files contain only one firm (and therefore would not need to be edited), while others contain a list of a large number of firms.
We can only hire a firm that would be able to provide us with a valid firm registration certificate.
28 freelancers are bidding on average £162 for this job
and where you want to store the parsed data? csv excel or database? i will make a html parse in python to parse the data and save to where ever you want...message me thanks!