Python script to combine CSVs
$30-250 USD
Paid on delivery
This project is to write a python script and turn it into a .exe that can run on windows 7 or later, and a .app that can run on mac OSX 10.4 or later. The python script, the .exe, and the .app are all deliverables.
OVERVIEW
We are a renewable energy company that works in buildings. In every building we work in, we get dozens of .CSV, .XLS, and .XLSX files that need to be combined into a single CSV. Each file has differing header information in the top in multiple rows (not necessarily just 2, it could be 20 rows of header info), then multiple columns of data beneath the header information. Some files may only have two columns of data, whereas other files may have 20 or more columns of data.
Each file also has a column (or two) of date/time stamps, showing the date and time each measurement in a row was taken. The time stamps from different files do not necessarily start at the same time – e.g. one may start at 3pm on July 24th, whereas another may start at 7pm on July 27th. Furthermore, the time stamps do not increment in the same step size – e.g. one may increase by five minutes per row, one may increase by 15 minutes per row, and a third may increase randomly per row.
Here is what the program must do:
• Read every CSV, XLS, and XLSX file in whatever folder the .app or .exe is placed. This way we can copy the app to a new folder full of csvs to combine.
• Combine all the CSVs, XLS, and XLSX files into one CSV, with header information from each individual file and the name of that file preserved above the columns from that file.
• Make a master time stamp column on the left side of the document:
o Find the smallest increment of any date/time stamp column – e.g. if there are time stamps incrementing at 1 minute, 5 minutes, and random minutes, make the master column of date/time stamps on the left increment at 1 minute per row.
o The master date/time stamp should start with the date/time of the earliest data point from any file, and end at the date/time of the latest data point from any column.
• Line up and space out all data columns so they correspond correctly to the master time stamp on the left.
• Delete redundant date/time stamp columns so there is only the master date/time stamp column on the left and no other date/time columns
• Make sure all header information is preserved over the proper columns
o Add in the name of the original file above the columns!
• Write everything to a new CSV called “[url removed, login to view]”
I’ve uploaded examples of data files of the type that will need to be combined, as well as an example of the final output file needed so you can see what I’m talking about. The examples are "14836...", "18102", and "condensate pump". The example output file is "combined_data". (Please note, in “[url removed, login to view]", columns AD through the end are NOT empty. There is data in row 6615, for example.)
Thank you for your help, and I look forward to working with you!
Best,
Brenden
Project ID: #2518755
About the project
Awarded to:
I'm a python programmer, working mostly as a freelancer. I can have the project done in 4 days. Contact me for more information.
13 freelancers are bidding on average $245 for this job
Hi, I have experience in managing csv, xls, xlsx files programmatically in JAVA which will be cross platform so will work on mac, windows and linux. Please let me know if you are interested in JAVA application with u More
Hi, i am well versed with python and mainly text processing tasks. I also have prior experience working in such projects. I can guarantee you a good quality product within provided time frame. :)
can do it. have great exprience in combining csv. wrked for a long time with realty company. alex.
We can really help you with this task. We have experience in industrial data manipulation (we have worked in this kind of project for a big alarm company), focus on code quality and expertise in python programs.
This project can be done easily. i have a mac expert at my disposal as well as a graphics designer if you want the end user to be more than your in house team. im currently unemployed and available 24/7 on skype. so More