It is generally said that the parquet format is better in terms of storage than JSON and CSV. The first link below says "Apache Parquet is a columnar file format that provides optimizations to speed up queries and is a far more efficient file format than CSV or JSON".
[login to view URL] to an external site.
[login to view URL] to an external site.
Now, let us try to demonstrate this. Download this CSV file (with 50,000 rows).
[login to view URL] to an external site.
Load the file as dataframe in Spark and save the dataframe again in JSON and Parquet format and check their file sizes. Do you see differences in file sizes? Report here.
Parquet is supposed to run faster than CSV. Show one query result to demonstrate that (such as finding the number of unique values in a certain column or so).
✔✔✔✔ Nice to see your posting ✔✔✔✔
Hi, sir.
I read your job posting and I am interested in Parquet.
SO what I have to do?
Please tell me via chat.
I hope to work with you.
Best regards.
Thanks!