Menu Close

A Comprehensive Guide to Pandas Data Structures

Pandas Data Structures

Hi Pandas lovers, In today’s article I will talk about Pandas data structures which are essential to the Pandas. You can say data structures in Pandas are the backbone for the data handling in Pandas.

So Let’s get started.

Python Pandas is an essential library for data manipulation and analysis in Python. It provides two primary data structures that are the backbone of data handling in the library: Pandas Series and Pandas DataFrame.

Understanding these data structures is crucial for effectively using Pandas to work with data.

In this article, we’ll dive deep into these data structures, exploring their features, use cases, and operations.

Before starting this article let’s understand some important terms labels, indexe, column, and row which are essential to understanding how data is structured and accessed. Here’s an explanation of each

Labels:

  • Label refers to the name associated with a particular element in the DataFrame. Labels are used for accessing and referencing data, whether it’s for rows (via the index) or columns.
  • Example: In a DataFrame, if you have a column named Age, then Age is the label of that column.

Index:

  • The index is a label or set of labels that uniquely identify rows in a DataFrame.
  • By default, Pandas assigns an integer index starting from 0, but you can set custom labels for the index.
  • Example: In a DataFrame with rows labeled as 0, 1, 2, …, those numbers are the index.

Column:

  • A column in a DataFrame is a labeled data series, representing one attribute or feature of the dataset.
  • Columns have labels, and these labels are used to access the data within the column.
  • Example: In a DataFrame with columns Name and Age, Name, and Age are the column labels.

Row:

  • A row in a DataFrame represents a single observation or data entry, which consists of values across all columns.
  • Rows are typically accessed using their index labels.
  • In the below DataFrame each student (Alice, Bob, Charlie) is a row in the DataFrame.

Now, Let’s move on to covert Python Panadas Data Structures.

Pandas Data Structures Introduction

Here, I am explaining the Pandas Series and Pandas DataFrame one by one with examples.

Pandas Series

What is a Series?

A Pandas Series is a one-dimensional labeled array capable of holding any data type, such as integers, strings, floats, or even Python objects.
You can relate it to a column in an Excel spreadsheet or a single-dimensional array with labels.

Creating Pandas Series

Pandas provide a constructor called series(). You can create a Series using the pd.Series() constructor, which takes a list, dictionary, scalar value, or NumPy array as input.

Here’s how you can create a Series:

import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)

Output

0    10
1    20
2    30
3    40
4    50
dtype: int64

Creating Pandas Series with Custom Index

You can also create a Series with a custom index, where each data point is associated with a label.

import pandas as pd

# Creating a Series with a custom index
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
print(series)

Output

a    10
b    20
c    30
d    40
e    50
dtype: int64

in the above example [‘a’, ‘b’, ‘c’, ‘d’, ‘e’] is the custom index.

Creating Pandas Series from a Dictionary

Another common way to create a Series is from a Python dictionary, where the keys become the index, and the values become the data of the series.

import pandas as pd

# Creating a Series from a dictionary
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
series = pd.Series(data)
print(series)

Output

a    10
b    20
c    30
d    40
e    50
dtype: int64

Series Operations

Pandas allow us to perform a wide range of operations on a Pandas Series, such as arithmetic, filtering, and applying functions.

import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)


# Arithmetic operations
print(series + 5)  # Adding 5 to each element

# Filtering
print(series[series > 30])  # Filtering elements greater than 30

# Applying a function
print(series.apply(lambda x: x * 2))  # Doubling each element

Accessing Elements in a Series

You can access elements in a Series by using the index label or the position.

# Access by label
print(series['a'])

# Access by position
print(series[0])

This is how you can create Pandas series, To explore more about Pandas series function you can take reference of Pandas series docs.

Let’s move on to the second Pandas data structure which is called DatFrame.

Pandas DataFrame

What is a Pandas DataFrame?

A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be seen as a collection of Series objects, sharing the same index.

In Pandas DataFrame, Each column represents the Pandas series.

Creating a DataFrame

You can create a DataFrame in several ways, such as from a dictionary of lists, a list of dictionaries, a NumPy array, from files like CSV, Excel, JSON, etc, or even another DataFrame.

import pandas as pd

# Creating a DataFrame from a dictionary of lists
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 45],
    'Gender': ['F', 'M', 'M', 'M', 'F']
}
df = pd.DataFrame(data)
print(df)

pd.DataFrame() Docs – Click Now

Output

      Name  Age Gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M
4      Eva   45      F

Creating Pandas DataFrame from List of Dictionaries

Pandas also provide a way to create DataFrame with the help of the Python dictionary.

# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Vishvajit', 'Age': 25, 'Gender': 'M'},
    {'Name': 'Harshita', 'Age': 30, 'Gender': 'F'},
    {'Name': 'Pooja', 'Age': 35, 'Gender': 'M'}
]
df = pd.DataFrame(data)
print(df)

Pandas DataFrame with Custom Index

Like Series, Youb can create Pandas DataFrames with the help of the custom index.

# Creating a DataFrame with a custom index
df = pd.DataFrame(data, index=['a', 'b', 'c'])
print(df)

Selecting Data in a DataFrame

You can select data from a DataFrame using label-based (loc), position-based (iloc), or direct column access.

import pandas as pd

# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'Gender': 'F'},
    {'Name': 'Bob', 'Age': 30, 'Gender': 'M'},
    {'Name': 'Charlie', 'Age': 35, 'Gender': 'M'}
]
df = pd.DataFrame(data, index=['a', 'b', 'c'])


# Selecting a single column
print(df['Name'])

# Selecting multiple columns
print(df[['Name', 'Age']])

# Selecting rows by label
print(df.loc['b'])

# Selecting rows by position
print(df.iloc[0])

Adding and Modifying Data

You can easily add or modify columns in a DataFrame.

import pandas as pd

# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'Gender': 'F'},
    {'Name': 'Bob', 'Age': 30, 'Gender': 'M'},
    {'Name': 'Charlie', 'Age': 35, 'Gender': 'M'}
]
df = pd.DataFrame(data, index=['a', 'b', 'c'])


# Adding the new column to the dataframe
df['Country'] = ['india', 'USA', 'Europe']

# Add 1 to each student's age
df['Age'] = df['Age'] + 1
print(df)

Output

      Name  Age Gender Country
a    Alice   26      F   india
b      Bob   31      M     USA
c  Charlie   36      M  Europe

Dropping from Pandas DataFrame

Dropping data from a DataFrame is straightforward, whether you’re removing rows or columns.

import pandas as pd

# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'Gender': 'F'},
    {'Name': 'Bob', 'Age': 30, 'Gender': 'M'},
    {'Name': 'Charlie', 'Age': 35, 'Gender': 'M'}
]
df = pd.DataFrame(data, index=['a', 'b', 'c'])

# Adding the new column to the dataframe
df['Country'] = ['india', 'USA', 'Europe']

# Dropping a column
df = df.drop('Country', axis=1)
print(df)

# Dropping a row
df = df.drop('a')
print(df)

Handling Missing Data

Pandas DataFrames provide various tools for handling missing data in DataFrame.

import pandas as pd
import numpy as np

# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'Gender': 'F', 'Score': 100},
    {'Name': 'Bob', 'Age': 30, 'Gender': 'M', 'Score': 60},
    {'Name': 'Charlie', 'Age': 35, 'Gender': 'M', 'Score': 70}
]
df = pd.DataFrame(data, index=['a', 'b', 'c'])

# assigning some missing data in DataFrame so that we can handle
df['Age'] = np.nan

# handling missing data
df_filled = df.fillna(30)


# Dropping rows with missing data
df_dropped = df.dropna()
print(df_dropped)

Pandas DataFrame Operations

You can perform various operations on DataFrames, such as replacing column names, grouping, filtering, aggregating, and merging.

Let’s see all of these by an example.

# replacing column name
df.rename(mapper={"Name": "Student Name"}, axis=1, inplace=True)
print(df)

inplace=True means it will replace in original Pandas DataFrame.

# apply group by on DataFrame
grouped = df.groupby("Gender").sum()
print(grouped)



# apply filter on DataFrame
filtered_df = df[df['Age'] > 30]
print(filtered_df)


# Merging DataFrames
df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Country': ["India", "USA", "India"]
})

merged_df = pd.merge(df, df2, on="Name")
print(merged_df)

These are the some operations that you can perform on Pandas DataFrame. You can explore more about the Pandas DataFrame in our Pandas tutorials.

See Also:

Conclusion

Pandas provide powerful and flexible data structures that are essential for any data manipulation and analysis task in Python. The Series and DataFrame structures allow for easy data handling, enabling operations ranging from simple data cleaning to complex aggregations and analyses. Whether you’re working with a single column of data or a multi-dimensional dataset, Pandas has the tools you need to make your work more efficient and effective.

If you found this article helpful, Please share and keep visiting for more Python Pandas tutorials.

How to Find the Nth Highest Salary Using Pandas
How to install Pandas in Python

Related Posts