Converting Data file to CSV file and read through Jupyter notebook
Hi Welcome ,This was my first blog, so thanks for medium platform to providing this opportunity. since I was very passionate in learning new technologies and I love to share the stuff which I came across my learning. In this blog I am going to share how to convert the Data File in to the CSV file and read it in Jupiter Notebook.
1.Datasets Extraction
There are various free sources where we can get data of any domain and infer the business insights. Below I am listing the various useful resource where we can extract the datafiles for our projects.
- Kaggle:https://www.kaggle.com/tags/web-sites
- UCI Machine Learning: Repository :https://archive.ics.uci.edu/ml/index.php
- Data World:https://data.world/datasets/website
- Government data:https://data.gov.in/
2 .Data file to CSV file through excel
Some of the data file you download may not be in the condition where you cannot directly work with platform, it may be unstructured, In order to convert it into better readable form you need to convert it into structured data. Here I will show how we are going to convert the unstructured data file to structured csv file and use it in the Jupyter platform where it is helpful for the data scientist and data analyst to make his/her work simple.
STEP 1: Download the file from data source to local desktop
STEP 2: Convert the Data file to Notepad
STEP 3: Open the notepad file and copy the content and paste in Excel
STEP 4: Change the content to structured format
After modification save the file and convert to comma separated values(csv) and store it in local desktop…
Read through Jupyter Notebook without changing anything in excel using python code.
open the jupyter notebook and type the following commands to import the csv file.
Import the basic pandas library to import the datasets
import numpy as np
import pandas as pd
After import the datasets read the csv file
df=pd.read_csv(“car.csv”)
Data frame has been created and now we are free to work and infer the insights of the datasets that is imported.
Explicitly pass header=0
to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Refer:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
([“symboling”,”normalized-losses”,”make”,”fuel-type”,”aspiration”, “num-of-doors”,”body-style”,”drive-wheels”,”enginelocation”,“wheelbase”,”length”,”width”,”height”,”curb-weigh”,”engine-type”,“num-of-cylinders”,”engine-size”,”fuel-system”,”bore”,”stroke”,“compression-ratio”,”horsepower”,”peak-rpm”,”city-mpg”,”highway-mpg” “price”],axis=’columns’,inplace=True)
Reading the DATA file through Python code is as simple .
If your data is in this format in csv ,you can read the data in notebook by not changing anything in excel using following code,
To see the code refer the GitHub profile below…
GitHub profile:https://github.com/vigneshsathish/Exploratory-Data-Analysis