Python script for data cleaning using Pandas DataFrame
$30-60 USD
In Progress
Posted over 4 years ago
$30-60 USD
Paid on delivery
Please read all before bidding.
I need a python script to clean data from an excel file then save the clean data to another excel file.
I wrote the structure of the script, what I expect exactly and how it should be written. I also have code for reading and saving file. What is missing is the data cleaning part.
Detailed request:
#fill Regex in config file based on Type (manually)
#read confFile into config_DF
#save dataframe to modified_DF
# drop rows based on drop_rows in job file
#check the column names are exactly as in the conf file. order not important
#remove trailing and leading spaces in values
#replace cells with wrong format by empty string (check Regex column in configuration file)
# replace cells with wrong values (below min, over max, is zero ...) by empty string
#Fill missing values based on rules in conf df "Missing Value Fill Method" [using pandas, numpy and scipy only]
#implement methods for 'ma' moving average with previous and next available values from same column[first value and last value missing will be equal to closest value available], and 'lr' linear regression using sklearn and 'knn' using fancyimpute ([login to view URL])
#apply sigma filtering on all columns based in multiplier in conf file. Should generate a df of bool, inSigmaDF, where each value is True if inside the +- sigma multiplier for each column (after calculating mean and std), False otherwise. Then delete all rows in modified_DF that contain at least one False in the inSigmaDF
#save modified_DF to outputFile
#each function above should return a dataframe or zero on success and a non-zero code 1,2,3 on failure/exception.
if it works with pipeline, modified_DF should be verified after each call and in case it is an int, return the int
#use try/except. this function should not throw exception to calling function but returning a non-zero code 1,2,3 ....for different errors
#USE 'apply' and 'lamda', never loops, to perform on columns. data['date'] = data['date'].apply(lambda x: somefct(x))
# when calling functions, use pipes (pdpipe, [login to view URL]) in the form:
pipeline = [login to view URL](modified_DF)
pipeline+= [login to view URL](modified_DF, config_DF)
pipeline+=[login to view URL](modified_DF, config_DF)
...
outDF = pipeline(df)
mainly the job consist of writing one main function (clean) and many small functions to clean data
Notes:
Other details and info required will be discussed as needed
All code should be documented (functions should have comments explain all variables and return values, and main part of the code).
Notes
Python 3.6+ should be used
Create an env to run the code in it
All python code should have [login to view URL] using pipreqs
Needed skills: Python, pandas, numpy, SciPy, sklearn
Extra skills: pdpipe, fancyimpute
Data scientist
I have a vast experience in an array of fields and I accept new challenges. I am available for hire to work on projects.
Statistics
Machine Learning
Deep Learning
Computer Vision
Natural Language Processing
Face Recognition
Data Analytics
Classification / Clustering
Supervised / Unsupervised Learning
Data Pipeline
Spark
Tools:
R
Python
JAVA
Tableau
Excel
Kafka
Spark
Prometheus
Grafana
Colab
AWS
hi Client. i have read and understood your request.i am interested in your project and can do it very well.
I have wide experience in Python development and i am looking forward to contact me, please.
i want to consult with you through chatting . thank you
Hi There,
I am having 6+ years experience in IT in data science, python and machine learning concepts.
I have experience in Pandas, Numpy libraries for Data Analytics. I have experience in Matplotlib, Seaborn, Plotly for Data Visualization. I have experience in Scikit-Learn for Supervised, Unsupervised model in Data Science.
I can write efficient code with Python, Pandas. Please let me know if you are interested to discuss further.
Regards,
Tejaswini
Hello,
I have been working with python and pandas for a long time now and have delivered various projects on the same. I believe that I can successfully deliver your project.