3 minute read

pandas is the mostly used module while analyzing data from a CSV, JSON, or Excel as the module can convert these data into dataframes and helps to analyze the data.

In this tutorial, we will see some basic usages of this module.

First thing first, let’s import the module

import pandas as pd

Now, let’s take a look at some basic tips and tricks


We can create a series from a list or dictionary where for a list, the index column has index values ($0,1,2,\dots$) and for a dictionary, the index column has key values (for below example, $a,b,c$).

mylist = ['a','b','c']
mynums = [1,2,3]
mydict = {'a':10,'b':20,'c':30}
# pd.Series(data=mynums,index=labels)

If we use two lists inside the pd.Series method, the first one’s items are put in the data column, and the second list items are put in the index column.


Creating Dataframes

We can simply create a dataframe using the pandas.Dataframe() method. The following example creates a matrix of size $5 \times 4$ with random numbers, and then set index labels from A to E, and column labels from W to Z.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())

Let’s take a look at another example-

import pandas as pd
data = [['X', 10], ['Y', 15], ['Z', 20]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])

The above example creates two columns with putting first elements from all internal lists from data, under the column Name and then second elements under the column Age.

If we want to create a dataframe by reading a CSV file, we can use the following:

df = pd.read_csv('file_name.csv')

You can use a few other options described below:

  1. If you want to import only a subset of columns, you can define which columns to import by using the option usecols
     pd.read_csv('file_name.csv', usecols= ['column_name_1','column_name_2'])

    or the indices of the columns

  2. If your CSV file does not have column headers, set header=None
     df.read_csv('file_name.csv’, header=None)
  3. If you want to use a particular column as index, use the option index_col
     pd.read_csv('file_name.csv', index_col='column_name_to_set_index')
  4. Import a range of rows from the file. The following example reads rows $31$ to $45$. skiprows skips the first $30$ while reading first $45$ rows using nrows.
     df = pd.read_csv('file_name.csv', dtype=float, skiprows=30, nrows=45)

Dataframe Operations

  1. Get a summary of the dataframe using describe()
  2. Get unique or non-unique values of a column
  3. Access a particular column
  4. Accessing multiple columns
  5. Accessing first $n$ or last $n$ number of rows. head(n) is used to access first $n$ rows and tail(n) is used to access last $n$ rows.
  6. Count number of appearences of a value
  7. Creating a new column
     df['new_column_name'] = df['column_name_1'] + df['column_name_2']
  8. Deleting a column
  9. Deleting a row
  10. selecting a row using loc or iloc
  11. Accessing a single value
  12. Accessing values from selected multiple rows and multiple columns
  13. creating a new CSV file

    exporting without index

    df.to_csv('file_name.csv', index=False)

    sometimes you can get the UnicodeEncodeError. To avoid that

    df.to_csv('file_name.csv', encoding='utf-8')

    export particular columns only


Dataframe Visualization (Graphs)

pandas module offers some direct visualizations (using matplotlib in the background).

  1. bar plot

    or a stacked bar plot

  2. histogram
  3. line plot
  4. scatter plot

    using a colormap

  5. box plot
  6. density plot

In this post, I just showed a few basic uses of the module pandas. In the next post, we will take a more elaborate look at the module.

For accessing all data science in python related posts, check this post:

Collection of Data Science in Python Posts in my Blog.

Have a nice day, cheers!

Leave a comment