Converting Data file to CSV file and read through Jupyter notebook

Vignesh S
4 min readJul 3, 2021

--

Hi Welcome ,This was my first blog, so thanks for medium platform to providing this opportunity. since I was very passionate in learning new technologies and I love to share the stuff which I came across my learning. In this blog I am going to share how to convert the Data File in to the CSV file and read it in Jupiter Notebook.

Data File to CSV file

1.Datasets Extraction

There are various free sources where we can get data of any domain and infer the business insights. Below I am listing the various useful resource where we can extract the datafiles for our projects.

  1. Kaggle:https://www.kaggle.com/tags/web-sites
  2. UCI Machine Learning: Repository :https://archive.ics.uci.edu/ml/index.php
  3. Data World:https://data.world/datasets/website
  4. Government data:https://data.gov.in/

2 .Data file to CSV file through excel

Some of the data file you download may not be in the condition where you cannot directly work with platform, it may be unstructured, In order to convert it into better readable form you need to convert it into structured data. Here I will show how we are going to convert the unstructured data file to structured csv file and use it in the Jupyter platform where it is helpful for the data scientist and data analyst to make his/her work simple.

STEP 1: Download the file from data source to local desktop

Data file

STEP 2: Convert the Data file to Notepad

Conversion to notepad

STEP 3: Open the notepad file and copy the content and paste in Excel

Paste the content of notepad to Excel.
Click on data in menu bar and click text to columns.

STEP 4: Change the content to structured format

Use the delimiter option and change in to relevant columns and rows click next.
Changed to relevant rows and columns.

After modification save the file and convert to comma separated values(csv) and store it in local desktop…

Read through Jupyter Notebook without changing anything in excel using python code.

open the jupyter notebook and type the following commands to import the csv file.

Import csv file to Jupyter

Import the basic pandas library to import the datasets

import numpy as np

import pandas as pd

After import the datasets read the csv file

df=pd.read_csv(“car.csv”)

Data frame has been created and now we are free to work and infer the insights of the datasets that is imported.

Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Refer:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

([“symboling”,”normalized-losses”,”make”,”fuel-type”,”aspiration”, “num-of-doors”,”body-style”,”drive-wheels”,”enginelocation”,“wheelbase”,”length”,”width”,”height”,”curb-weigh”,”engine-type”,“num-of-cylinders”,”engine-size”,”fuel-system”,”bore”,”stroke”,“compression-ratio”,”horsepower”,”peak-rpm”,”city-mpg”,”highway-mpg” “price”],axis=’columns’,inplace=True)

Reading the DATA file through Python code is as simple .

If your data is in this format in csv ,you can read the data in notebook by not changing anything in excel using following code,

To see the code refer the GitHub profile below…

GitHub profile:https://github.com/vigneshsathish/Exploratory-Data-Analysis

--

--

Vignesh S

Data scientist Aspirant passionate in learning new technologies and sharing my thoughts to others .