Hi Pandas lovers, In today’s article I will talk about Pandas data structures which are essential to the Pandas. You can say data structures in Pandas are the backbone for the data handling in Pandas.
So Let’s get started.
Python Pandas is an essential library for data manipulation and analysis in Python. It provides two primary data structures that are the backbone of data handling in the library: Pandas Series and Pandas DataFrame.
Understanding these data structures is crucial for effectively using Pandas to work with data.
In this article, we’ll dive deep into these data structures, exploring their features, use cases, and operations.
Before starting this article let’s understand some important terms labels, indexe, column, and row which are essential to understanding how data is structured and accessed. Here’s an explanation of each
Labels:
- Label refers to the name associated with a particular element in the DataFrame. Labels are used for accessing and referencing data, whether it’s for rows (via the index) or columns.
- Example: In a DataFrame, if you have a column named Age, then Age is the label of that column.
Index:
- The index is a label or set of labels that uniquely identify rows in a DataFrame.
- By default, Pandas assigns an integer index starting from 0, but you can set custom labels for the index.
- Example: In a DataFrame with rows labeled as 0, 1, 2, …, those numbers are the index.
Column:
- A column in a DataFrame is a labeled data series, representing one attribute or feature of the dataset.
- Columns have labels, and these labels are used to access the data within the column.
- Example: In a DataFrame with columns Name and Age, Name, and Age are the column labels.
Row:
- A row in a DataFrame represents a single observation or data entry, which consists of values across all columns.
- Rows are typically accessed using their index labels.
- In the below DataFrame each student (Alice, Bob, Charlie) is a row in the DataFrame.
Now, Let’s move on to covert Python Panadas Data Structures.
Headings of Contents
- 1 Pandas Data Structures Introduction
- 2 Pandas Series
- 3 Pandas DataFrame
- 3.1 What is a Pandas DataFrame?
- 3.2 Creating a DataFrame
- 3.3 Creating Pandas DataFrame from List of Dictionaries
- 3.4 Pandas DataFrame with Custom Index
- 3.5 Selecting Data in a DataFrame
- 3.6 Adding and Modifying Data
- 3.7 Dropping from Pandas DataFrame
- 3.8 Handling Missing Data
- 3.9 Pandas DataFrame Operations
- 4 Conclusion
Pandas Data Structures Introduction
Here, I am explaining the Pandas Series and Pandas DataFrame one by one with examples.
Pandas Series
What is a Series?
A Pandas Series is a one-dimensional labeled array capable of holding any data type, such as integers, strings, floats, or even Python objects.
You can relate it to a column in an Excel spreadsheet or a single-dimensional array with labels.
Creating Pandas Series
Pandas provide a constructor called series(). You can create a Series using the pd.Series() constructor, which takes a list, dictionary, scalar value, or NumPy array as input.
Here’s how you can create a Series:
import pandas as pd # Creating a Series from a list data = [10, 20, 30, 40, 50] series = pd.Series(data) print(series)
Output
0 10
1 20
2 30
3 40
4 50
dtype: int64
Creating Pandas Series with Custom Index
You can also create a Series with a custom index, where each data point is associated with a label.
import pandas as pd # Creating a Series with a custom index data = [10, 20, 30, 40, 50] index = ['a', 'b', 'c', 'd', 'e'] series = pd.Series(data, index=index) print(series)
Output
a 10
b 20
c 30
d 40
e 50
dtype: int64
in the above example [‘a’, ‘b’, ‘c’, ‘d’, ‘e’] is the custom index.
Creating Pandas Series from a Dictionary
Another common way to create a Series is from a Python dictionary, where the keys become the index, and the values become the data of the series.
import pandas as pd # Creating a Series from a dictionary data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50} series = pd.Series(data) print(series)
Output
a 10
b 20
c 30
d 40
e 50
dtype: int64
Series Operations
Pandas allow us to perform a wide range of operations on a Pandas Series, such as arithmetic, filtering, and applying functions.
import pandas as pd # Creating a Series from a list data = [10, 20, 30, 40, 50] series = pd.Series(data) print(series) # Arithmetic operations print(series + 5) # Adding 5 to each element # Filtering print(series[series > 30]) # Filtering elements greater than 30 # Applying a function print(series.apply(lambda x: x * 2)) # Doubling each element
Accessing Elements in a Series
You can access elements in a Series by using the index label or the position.
# Access by label print(series['a']) # Access by position print(series[0])
This is how you can create Pandas series, To explore more about Pandas series function you can take reference of Pandas series docs.
Let’s move on to the second Pandas data structure which is called DatFrame.
Pandas DataFrame
What is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be seen as a collection of Series objects, sharing the same index.
In Pandas DataFrame, Each column represents the Pandas series.
Creating a DataFrame
You can create a DataFrame in several ways, such as from a dictionary of lists, a list of dictionaries, a NumPy array, from files like CSV, Excel, JSON, etc, or even another DataFrame.
import pandas as pd # Creating a DataFrame from a dictionary of lists data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'], 'Age': [25, 30, 35, 40, 45], 'Gender': ['F', 'M', 'M', 'M', 'F'] } df = pd.DataFrame(data) print(df)
pd.DataFrame() Docs – Click Now
Output
Name Age Gender
0 Alice 25 F
1 Bob 30 M
2 Charlie 35 M
3 David 40 M
4 Eva 45 F
Creating Pandas DataFrame from List of Dictionaries
Pandas also provide a way to create DataFrame with the help of the Python dictionary.
# Creating a DataFrame from a list of dictionaries data = [ {'Name': 'Vishvajit', 'Age': 25, 'Gender': 'M'}, {'Name': 'Harshita', 'Age': 30, 'Gender': 'F'}, {'Name': 'Pooja', 'Age': 35, 'Gender': 'M'} ] df = pd.DataFrame(data) print(df)
Pandas DataFrame with Custom Index
Like Series, Youb can create Pandas DataFrames with the help of the custom index.
# Creating a DataFrame with a custom index df = pd.DataFrame(data, index=['a', 'b', 'c']) print(df)
Selecting Data in a DataFrame
You can select data from a DataFrame using label-based (loc), position-based (iloc), or direct column access.
import pandas as pd # Creating a DataFrame from a list of dictionaries data = [ {'Name': 'Alice', 'Age': 25, 'Gender': 'F'}, {'Name': 'Bob', 'Age': 30, 'Gender': 'M'}, {'Name': 'Charlie', 'Age': 35, 'Gender': 'M'} ] df = pd.DataFrame(data, index=['a', 'b', 'c']) # Selecting a single column print(df['Name']) # Selecting multiple columns print(df[['Name', 'Age']]) # Selecting rows by label print(df.loc['b']) # Selecting rows by position print(df.iloc[0])
Adding and Modifying Data
You can easily add or modify columns in a DataFrame.
import pandas as pd # Creating a DataFrame from a list of dictionaries data = [ {'Name': 'Alice', 'Age': 25, 'Gender': 'F'}, {'Name': 'Bob', 'Age': 30, 'Gender': 'M'}, {'Name': 'Charlie', 'Age': 35, 'Gender': 'M'} ] df = pd.DataFrame(data, index=['a', 'b', 'c']) # Adding the new column to the dataframe df['Country'] = ['india', 'USA', 'Europe'] # Add 1 to each student's age df['Age'] = df['Age'] + 1 print(df)
Output
Name Age Gender Country
a Alice 26 F india
b Bob 31 M USA
c Charlie 36 M Europe
Dropping from Pandas DataFrame
Dropping data from a DataFrame is straightforward, whether you’re removing rows or columns.
import pandas as pd # Creating a DataFrame from a list of dictionaries data = [ {'Name': 'Alice', 'Age': 25, 'Gender': 'F'}, {'Name': 'Bob', 'Age': 30, 'Gender': 'M'}, {'Name': 'Charlie', 'Age': 35, 'Gender': 'M'} ] df = pd.DataFrame(data, index=['a', 'b', 'c']) # Adding the new column to the dataframe df['Country'] = ['india', 'USA', 'Europe'] # Dropping a column df = df.drop('Country', axis=1) print(df) # Dropping a row df = df.drop('a') print(df)
Handling Missing Data
Pandas DataFrames provide various tools for handling missing data in DataFrame.
import pandas as pd import numpy as np # Creating a DataFrame from a list of dictionaries data = [ {'Name': 'Alice', 'Age': 25, 'Gender': 'F', 'Score': 100}, {'Name': 'Bob', 'Age': 30, 'Gender': 'M', 'Score': 60}, {'Name': 'Charlie', 'Age': 35, 'Gender': 'M', 'Score': 70} ] df = pd.DataFrame(data, index=['a', 'b', 'c']) # assigning some missing data in DataFrame so that we can handle df['Age'] = np.nan # handling missing data df_filled = df.fillna(30) # Dropping rows with missing data df_dropped = df.dropna() print(df_dropped)
Pandas DataFrame Operations
You can perform various operations on DataFrames, such as replacing column names, grouping, filtering, aggregating, and merging.
Let’s see all of these by an example.
# replacing column name df.rename(mapper={"Name": "Student Name"}, axis=1, inplace=True) print(df) inplace=True means it will replace in original Pandas DataFrame. # apply group by on DataFrame grouped = df.groupby("Gender").sum() print(grouped) # apply filter on DataFrame filtered_df = df[df['Age'] > 30] print(filtered_df) # Merging DataFrames df2 = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Country': ["India", "USA", "India"] }) merged_df = pd.merge(df, df2, on="Name") print(merged_df)
These are the some operations that you can perform on Pandas DataFrame. You can explore more about the Pandas DataFrame in our Pandas tutorials.
See Also:
- How to Add Date Column in Pandas DataFrame
- How to Get Day Name from Date in Pandas DataFrame
- How to Split String in Pandas DataFrame Column
- How to Drop Duplicate Rows in Pandas DataFrame
- How to Get Top 10 Lowest Values in Pandas DataFrame
- How to Get Top 10 Highest Values in Pandas DataFrame
- How to Display First 10 Rows in Pandas DataFrame
- How to Explode Multiple Columns in Pandas
- How to use GroupBy in Pandas DataFrame
Conclusion
Pandas provide powerful and flexible data structures that are essential for any data manipulation and analysis task in Python. The Series and DataFrame structures allow for easy data handling, enabling operations ranging from simple data cleaning to complex aggregations and analyses. Whether you’re working with a single column of data or a multi-dimensional dataset, Pandas has the tools you need to make your work more efficient and effective.
If you found this article helpful, Please share and keep visiting for more Python Pandas tutorials.