Menu Close

Mastery Data Selection: Loc and iLoc in Pandas

loc and iloc in Pandas

Hello Pandas lovers, in today’s article I will teach you loc and iloc in Pandas with the help of the proper examples and explanation.

As a Data analyst and Data engineer, We must know about the loc and iloc in Pandas because these two methods are beneficial for working with Data on Pandas DataFrame and data series.

Throughout this article, we will cover multiple use cases of the Pandas iloc and loc.

Let’s start the tutorial.

Make sure you have installed the Pandas library in your Python environment. If you have not installed Python Pandas, you can refer to our Pandas installation tutorial.

Now Let’s start the tutorial.

Master Loc and iLoc in Pandas DataFrame

loc and iloc in Pandas are used to access and manipulate data within a DataFrame or Series. They provide different methods for selecting data based on labels or integer-based positions.

To implement Pandas loc and iloc, I have created a simple Pandas DataFrame along with some information which you can see below.

import pandas as pd


data = {
    "name": ["Vishvajit", "Harsh", "Sonu", "Peter"],
    "age": [26, 25, 30, 33],
    "country": ["India", "India", "India", "USA"],
}

index = ['a', 'b', 'c', 'd']

df = pd.DataFrame(data, index=index)
print(df)

Output:

        name  age country
a  Vishvajit   26   India
b      Harsh   25   India
c       Sonu   30   India
d      Peter   33     USA

In the above Output, name, age, and country are the column labels and a, b, c, and d are the row labels.

Loc and iLoc in Pandas DataFrame

Let’s get started with Pandas loc.

Pandas Loc -> Label-Based Indexing

Pandas loc is used for label-based indexing. This means you use the row and column labels to select data from Pandas DataFrame. It is inclusive of the start and end labels.

Examples Usages

In the above DataFrame name, age and country are the column labels and a, b, c, and d are the row labels.

Now, Let’s see various use cases of Pandas loc with proper examples.

To select data from Pandas DataFrame, use the below Pandas loc syntax.

Syntax:

df.loc[rows labels, column labels]

Selecting a Single Row by Label

Pass a single row label to select a single row from the Pandas DataFrame.

row_b = df.loc['b']
print(row_b)

Output

name       Harsh
age           25
country    India
Name: b, dtype: object
Note:- df.loc['b'] will return Pandas series because here we are using a single row label but In the case of DataFrame, you need to pass a list of labels (e.g., df.loc[['a', 'b']]).

Selecting Multiple Rows by Label

Pass multiple rows as a Python list to select multiple rows.

# Select rows with labels 'a' and 'c'
rows_ac = df.loc[['a', 'c']]
print(rows_ac)

Selecting Specific Rows and Columns by Label

Select specific rows and columns from the Pass row label as the first argument and the column label as the second argument.

# Select rows 'a' and 'b', and columns 'name' and 'country'
subset = df.loc[['a', 'b'], ['name', 'country']]
print(subset)

Output

       name country
a  Vishvajit   India
b      Harsh   India

In the above example, I am selecting rows a and b but using a slicing technique to select rows also.

# Select rows from a to c, and columns 'name' and 'country'
subset = df.loc['a':'c', ['name', 'country']]
print(subset)

Selecting Specific Rows and Columns by using Boolean

You can use a boolean array of the same length axis being sliced.

result = df.loc[[True, True, False, True], [True, True, False]]
print(result)

Output

       name  age
a  Vishvajit   26
b      Harsh   25
d      Peter   33

Changing Value by Using Loc

You can also change the Value of Pandas DataFrame by using Pandas loc.

Changing a Single Value

Changing the value at row label ‘b‘ and column ‘Age‘.

df.loc['b', 'Age'] = 26
print(df)

Changing Multiple Values

# Change age for rows 'b' and 'c'
df.loc[['b', 'c'], 'age'] = [26, 31]
print(df)

Change All Values in a Column

To update all values in the ‘name‘ column.

df.loc[:, 'name'] = df['name'].str.upper()
print(df)
Note:- df.loc[:, 'name'] will select all rows along with name column.

Conditional Selection

Selecting records whose age is 30 or greater than 30

You can apply the condition using the Pandas loc. For example, I want to select only those people whose age is 30 or greater than 30.

condition = df['age'] >= 30
result = df.loc[condition]
print(result)

Output

    name  age country
c   Sonu   30   India
d  Peter   33     USA

Selecting records whose name contains ‘sh’

# selecting person who name contains 'sh'
condition = df['name'].str.contains('sh')
result = df.loc[condition]
print(result)

Output

        name  age country
a  Vishvajit   26   India
b      Harsh   25   India

Pandas iLoc -> Integer-based Indexing

Pandas iloc is used for integer-based indexing. This means you use integer indices (positions) to access data.

Use below Pandas iloc syntax to get data from Pandas DataFrame.

df.iloc[row_indices, column_indices]

Let’s see the use cases of Pandas iloc.

Selecting a Single Row by Index Position

# Select the row at index position 1
row_1 = df.iloc[1]
print(row_1)

Output

name       Harsh
age           25
country    India
Name: b, dtype: object

Selecting Multiple Rows by Index Position

You can pass multiple integer index numbers as a Python list.

# Select the row at index position 0 and 2
result = df.iloc[[0, 2]]
print(result)

Output

        name  age country
a  Vishvajit   26   India
c       Sonu   30   India

Selecting Specific Rows and Columns by Index Position

You can pass the index position range of rows and columns to select the data.

# Select rows at positions 0 and 1, and columns at positions 0 and 1
subset = df.iloc[0:2, 0:2]
print(subset)

Output

        name  age
a  Vishvajit   26
b      Harsh   25

Changing Value by Using Loc

Change a Single Value

To change the value at row index 1 and column index 1 use the below code.

df.iloc[1, 1] = 26
print(df)

Change Multiple Values

To change values for multiple rows and columns use the below iloc example.

# Change the age for rows at index positions 1 and 2
df.iloc[1:3, 1] = [32, 37]
print(df)

Change All Values in a Row or Column

To update all values in the first column use this Pandas iloc example.

df.iloc[:, 0] = df.iloc[:, 0].str.upper()  # Ensure the o index column contains strings
print(df)

Conditional Selection Using Indexing

Pandas iloc uses integer-based position that’s why it does not care about row labels but in the above DataFrame, we are using row labels so we will need to get the position of the labels that meet the condition.

Let’s see how we can do that.

Here, I have taken two examples.

Selecting rows who age is 30 or greater than 30:

# selecting rows whose age is 30 or grater than 30
condition = df['age'] >= 30
print(condition)
indexes = df.index[condition].to_list()
print(indexes)
idx = [df.index.get_loc(label) for label in indexes]
print(idx)
result = df.iloc[idx]
print(result)

Output

a    False
b    False
c     True
d     True
Name: age, dtype: bool
['c', 'd']
[2, 3]
    name  age country
c   Sonu   30   India
d  Peter   33     USA

In the above example:

  • Firstly, I am applying a condition to check whose age is 30 years or more.
  • Second, Displaying the condition value.
  • Third, I am displaying a series of boolean returns by the first line.
  • Fourth, I am getting the label name whose corresponding boolean value is True in condition.
  • Fourth, Display the indexes value as you can see in the output [‘c’, ‘d’].
  • Fifth, Getting the integer position of labels stored in the indexes variable because iloc is an integer using integer-based position, not rows labels.
  • Sixth, Displaying integer position of the labels [‘c’, ‘d’] as you can see in the Output which is [2, 3].
  • And at last, getting the rows from DataFrame and displaying.

Now, let’s move on to the second example.

Selecting rows whose name contains ‘sh’:

The process of this example is also the same as above only the condition will be changed.

# selecting rows whose name contains 'sh'
condition = df['name'].str.contains('sh')
indexes = df.index[condition].to_list()
idx = [df.index.get_loc(label) for label in indexes]
result = df.iloc[idx]
print(result)

Output

        name  age country
a  Vishvajit   26   India
b      Harsh   25   India

I am not going to explain this because it is almost the same as above. You are a very smart guy, You can understand by above explanation.

Difference Between Loc and iLoc in Pandas

Here, I have listed some important points about Pandas loc and Pandas iLoc.

  • Use Pandas loc when you need to access rows or columns by their labels. It is more intuitive if you are working with labeled axes.
  • Use Pandas iloc when you need to access rows or columns by their integer position. This is useful when the exact label is unknown or when you are working with a large DataFrame and are more interested in position-based operations.
  • Both loc and iloc provide powerful ways to select and manipulate data in pandas DataFrames, making them essential tools for data analysis and manipulation.

Let’s summarize the difference between loc and iloc in Pandas in a table.

FeaturesPandas LocPandas iLoc
Access TypeLabels-based indexingInteger-based indexing
IndexingUses row and column labelsUses integer-based positions (indices)
Syntaxdf.loc[row_label, column_label]df.iloc[row_index, column_index]
Row/Column Selectionit also supports boolean indexingWe can select rows and columns by their labels
Supports SlicingSupports slicing with labels (e.g., 'a':'c')Supports slicing with integer positions(e.g., 1:3)
Boolean Indexingit supports boolean indexingYes, the end label is included in the slicing
Label ErrorsRaises KeyError if the label is not foundRaises IndexError if the index is out of bounds
Includes Last LabelYes, the end label is included in slicingYes, the end label is included in the slicing
DataFrame Exampledf.loc[‘a’:’c’, ‘A’:’B’]df.iloc[1:3, 0:2]

See Also:


Conclusion

So, Throughout this article, we have seen all about loc and iloc in Pandas with the help of the example. Pandas loc is used to access rows and columns by using row labels and column labels and iloc is used to access rows and columns by integer-based indexing.

Pandas loc and iloc are one of the most important concepts in Pandas to select the data from Pandas DataFrame.

If you found this article helpful, Please share and keep visiting for further Pandas tutorials.

Happy Learning…

How to install Pandas in Python

Related Posts