Find Jobs
Hire Freelancers

Python script for data cleaning using Pandas DataFrame

$30-60 USD

In Progress
Posted over 4 years ago

$30-60 USD

Paid on delivery
Please read all before bidding. I need a python script to clean data from an excel file then save the clean data to another excel file. I wrote the structure of the script, what I expect exactly and how it should be written. I also have code for reading and saving file. What is missing is the data cleaning part. Detailed request: #fill Regex in config file based on Type (manually) #read confFile into config_DF #save dataframe to modified_DF # drop rows based on drop_rows in job file #check the column names are exactly as in the conf file. order not important #remove trailing and leading spaces in values #replace cells with wrong format by empty string (check Regex column in configuration file) # replace cells with wrong values (below min, over max, is zero ...) by empty string #Fill missing values based on rules in conf df "Missing Value Fill Method" [using pandas, numpy and scipy only] #implement methods for 'ma' moving average with previous and next available values from same column[first value and last value missing will be equal to closest value available], and 'lr' linear regression using sklearn and 'knn' using fancyimpute ([login to view URL]) #apply sigma filtering on all columns based in multiplier in conf file. Should generate a df of bool, inSigmaDF, where each value is True if inside the +- sigma multiplier for each column (after calculating mean and std), False otherwise. Then delete all rows in modified_DF that contain at least one False in the inSigmaDF #save modified_DF to outputFile #each function above should return a dataframe or zero on success and a non-zero code 1,2,3 on failure/exception. if it works with pipeline, modified_DF should be verified after each call and in case it is an int, return the int #use try/except. this function should not throw exception to calling function but returning a non-zero code 1,2,3 ....for different errors #USE 'apply' and 'lamda', never loops, to perform on columns. data['date'] = data['date'].apply(lambda x: somefct(x)) # when calling functions, use pipes (pdpipe, [login to view URL]) in the form: pipeline = [login to view URL](modified_DF) pipeline+= [login to view URL](modified_DF, config_DF) pipeline+=[login to view URL](modified_DF, config_DF) ... outDF = pipeline(df) mainly the job consist of writing one main function (clean) and many small functions to clean data Notes: Other details and info required will be discussed as needed All code should be documented (functions should have comments explain all variables and return values, and main part of the code). Notes Python 3.6+ should be used Create an env to run the code in it All python code should have [login to view URL] using pipreqs Needed skills: Python, pandas, numpy, SciPy, sklearn Extra skills: pdpipe, fancyimpute
Project ID: 22637191

About the project

5 proposals
Remote project
Active 4 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
5 freelancers are bidding on average $42 USD for this job
User Avatar
Data scientist I have a vast experience in an array of fields and I accept new challenges. I am available for hire to work on projects. Statistics Machine Learning Deep Learning Computer Vision Natural Language Processing Face Recognition Data Analytics Classification / Clustering Supervised / Unsupervised Learning Data Pipeline Spark Tools: R Python JAVA Tableau Excel Kafka Spark Prometheus Grafana Colab AWS
$45 USD in 1 day
4.8 (5 reviews)
3.0
3.0
User Avatar
hi Client. i have read and understood your request.i am interested in your project and can do it very well. I have wide experience in Python development and i am looking forward to contact me, please. i want to consult with you through chatting . thank you
$45 USD in 7 days
5.0 (1 review)
0.8
0.8
User Avatar
Hi There, I am having 6+ years experience in IT in data science, python and machine learning concepts. I have experience in Pandas, Numpy libraries for Data Analytics. I have experience in Matplotlib, Seaborn, Plotly for Data Visualization. I have experience in Scikit-Learn for Supervised, Unsupervised model in Data Science. I can write efficient code with Python, Pandas. Please let me know if you are interested to discuss further. Regards, Tejaswini
$50 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi i can do this.
$35 USD in 8 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I have been working with python and pandas for a long time now and have delivered various projects on the same. I believe that I can successfully deliver your project.
$35 USD in 1 day
0.0 (0 reviews)
0.0
0.0

About the client

Flag of LEBANON
Beirut, Lebanon
5.0
4
Payment method verified
Member since Apr 7, 2015

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.