Find Jobs
Hire Freelancers

C or fortran function to transform .csv file to sparse matrix

$30-5000 USD

Closed
Posted about 13 years ago

$30-5000 USD

Paid on delivery
There are 2 datasets, A and B. Set A contains a list of 3 mln. unique ids. Set B contains, for a subset of these ids, 5 variables: date1, date2, product, cost and quantity. There are about 10,000 distinct products. Each row in set B represents a sale of a product to a customer (identified by the id), sold on date1 and paid on date2 (with date2 >= date1). There can be up to 100 million rows in set B. The function I am looking for should transform set A and B into two matrices, C and D. Both matrices should have: - a total number of rows equal to the total number of unique (customer) ids, i.e. equal to the number of rows in set A. - a total number of columns equal to the total number of unique products (i.e. about 10,000) Each cell in matrix C should contain the total quantity of a product sold to a customer, and each cell in matrix D should contain the total cost of a product. Matrix C and D should have a CCS (compressed column storage) sparse matrix format (see e.g. [login to view URL]). You can use any free publicly available library or code as part of your program. ## Deliverables There are 2 datasets, A and B. Set A contains a list of 3 mln. unique ids. Set B contains, for a subset of these ids, 5 variables: date1, date2, product, cost and quantity. There are about 10,000 distinct products. Each row in set B represents a sale of a product to a customer (identified by the id), sold on date1 and paid on date2 (with date2 >= date1). There are about 100 million rows in set B. The function I am looking for should transform set A and B into two matrices, C and D. Both matrices should have: - a total number of rows equal to the total number of unique (customer) ids, i.e. equal to the number of rows in set A. - a total number of columns equal to the total number of unique products (i.e. about 10,000) Each cell in matrix C should contain the total quantity of a product sold to a customer, and each cell in matrix D should contain the total cost of a product. The function will require four additional inputs: - mindate1 - maxdate1 - mindate2 - maxdate2 To build matrix C and D, only rows from Set B should be processed that satisfy both of the following conditions: - mindate1 <= date1 <= maxdate1 - mindate2 <= date2 <= maxdate2 It is expected that only about 0.1% of the cells in matrix C and D are non-zero. Matrix C and D therefore should have a CCS (compressed column storage) sparse matrix format (see e.g. [login to view URL]). The function should not require more than 15GB RAM when executed on the data as specified above (cost and quantity variables both have double, i.e. 8 byte, storage format). Example (in this example we omit the mindate1...maxdate2 restrictions): inputs: Set A: id 15 1 2 100 Set B: id, date1, date2, prod, cost, quantity, 100, '17/02/2008', '19/02/2008', C, 79, 30, 15, '11/01/2008', '11/01/2009', A, 100.51, 2, 100, '17/02/2008', '19/02/2008', A, 79, 7, 1, '15/03/2008', '11/01/2009', B, 3.71, 13, 15, '11/10/2008', '17/01/2009', A, 58, 1, matrix C (column names would be: id, prod_A, prod_B, prod_C) 1, 0, 13, 0, 2, 0, 0, 0, 15, 3, 0, 0, 100, 7, 0, 30, output in CCS sparse format: row_ind = {1, 2, 3, 4, 3, 4, 1, 4} col_ptr = {1, 5, 7, 8} val = {1, 2, 15, 100, 3, 7, 13, 30} matrix D (column names would be: id, prod_A, prod_B, prod_C) 1, 0, 3.71, 0, 2, 0, 0, 0, 15, 158.51, 0, 0, 100, 79, 0, 79, output in CCS sparse format: row_ind = {1, 2, 3, 4, 3, 4, 1, 4} col_ptr = {1, 5, 7, 8} val = {1, 2, 15, 100, 158.51, 79, 3.71, 79} I will provide you with sample datasets A and B and the corresponding matrices C and D for testing purposes.
Project ID: 3250970

About the project

12 proposals
Remote project
Active 13 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
12 freelancers are bidding on average $136 USD for this job
User Avatar
See private message.
$255 USD in 14 days
5.0 (52 reviews)
6.0
6.0
User Avatar
See private message.
$85 USD in 14 days
4.9 (39 reviews)
6.3
6.3
User Avatar
See private message.
$170 USD in 14 days
5.0 (12 reviews)
5.7
5.7
User Avatar
See private message.
$143.65 USD in 14 days
5.0 (44 reviews)
5.0
5.0
User Avatar
See private message.
$102 USD in 14 days
5.0 (20 reviews)
4.3
4.3
User Avatar
See private message.
$170 USD in 14 days
4.2 (32 reviews)
4.5
4.5
User Avatar
See private message.
$84.99 USD in 14 days
5.0 (10 reviews)
3.0
3.0
User Avatar
See private message.
$178.50 USD in 14 days
5.0 (6 reviews)
3.0
3.0
User Avatar
See private message.
$85 USD in 14 days
5.0 (2 reviews)
2.3
2.3
User Avatar
See private message.
$212.50 USD in 14 days
5.0 (2 reviews)
1.4
1.4
User Avatar
See private message.
$42.50 USD in 14 days
0.0 (0 reviews)
0.0
0.0
User Avatar
See private message.
$106.25 USD in 14 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of NETHERLANDS
Netherlands
5.0
11
Payment method verified
Member since Nov 3, 2010

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.