In this Pandas article, we are going to see multiple ways to replace column values in Pandas DataFrame based on the condition.
This is one of the asked questions during Data analysis, Data engineering, and Data scientist interviews and also it is very useful in real Pandas applications where you can replace any specific value of a column based on the conditions.
Pandas provides multiple ways to replace the column value of Pandas DataFrame, Let’s explore all these methods along with an example.
You can get more about Python Pandas from the Python Pandas tutorial page.
I have prepared a sample CSV dataset as you can see below and this sample data will be used throughout this article.
Requirement:- My requirement is to replace the ‘Male‘ value with ‘M‘ and ‘Female‘ with ‘F‘ in the ‘emp_gender‘ column of Pandas DataFrame.
Headings of Contents
- 1 Load CSV Data into Pandas DataFrame
- 2 Replace column values in Pandas DataFrame using Assignment Operator
- 3 Replace Column Values in Pandas DataFrame using the replace() Method
- 4 Replace Column Values in Pandas DataFrame using loc Property
- 5 Replace Column Values in Pandas DataFrame using np.where() Method
- 6 Replace Column Values in Pandas DataFrame using the mask() Method
- 7 Conclusion
Load CSV Data into Pandas DataFrame
Pandas has a method called read_csv() method that is used to load the CSV data into Pandas DataFrame because without DataFrame we cannot perform any methods in order to replace values inside a column.
Let’s load CSV data into Pandas DataFrame using the read_csv() method.
import pandas as pd
df = pd.read_csv(
'../../Datasets/employee_dataset.csv'
)
Now, I have loaded CSV datasets to Pandas DataFrame, and let’s explore all the methods to replace column values in Pandas DataFrame based on the condition.
👉 Read CSV File into Pandas DataFrame
Replace column values in Pandas DataFrame using Assignment Operator
The assignment operator ( = ) can be used to replace values in specific columns of the Pandas DataFrame for example. In the below example, I have replaced ‘Male‘ with ‘M‘ and ‘Female‘ with ‘F‘ using the assignment operator.
df = pd.read_csv(
'../../Datasets/employee_dataset.csv'
)
df['emp_gender'] = df['emp_gender'].replace('Male', 'M').replace('Female', 'F')
print(df)
Replace Column Values in Pandas DataFrame using the replace() Method
Python Pandas DataFrame has a method called replace() that is used to replace the value on Pandas DataFrame.You can use the replace() method in different conditions. This is one of the best Python Pandas DataFrame methods to replace values in DataFrame.
It will return a new DataFrame after replacing the value into DataFrame, To replace the existing DataFrame you have to pass inplace=True into the replace() method. Here I am not going to pass inplace=True because I don’t want to change the existing dataframe.
In the following example, I have replaced ‘Male‘ with ‘M‘ and ‘Female‘ with ‘F‘ using the replace() method.
df = pd.read_csv(
'../../Datasets/employee_dataset.csv'
)
df = df.replace({'emp_gender': {'Male': 'M', 'Female': 'F'}})
print(df)
Replace Column Values in Pandas DataFrame using loc Property
loc is a Python Pandas DataFrame property that is used to access the group of rows and columns from DataFrame using labels or a boolean array. It is also used to replace column value base don the condition and without condition.
Remember, the loc property will replace the existing DataFrame.Be careful when you are using this loc property.
let’s see a way of using the Python Pandas DataFrame loc property.
import pandas as pd
df = pd.read_csv(
'../../Datasets/employee_dataset.csv'
)
df.loc[df['emp_gender'] == 'Male', ['emp_gender']] = 'M'
df.loc[df['emp_gender'] == 'Female', ['emp_gender']] = 'F'
print(df)
Replace Column Values in Pandas DataFrame using np.where() Method
Python Numpy is another popular and open-source library that provides a method called where() that is used to replace the value in a column based on the condition. To use the where() method we need to import the where() method from the Python Numpy module.
In the below example, I have replaced ‘Male‘ with ‘M‘ and ‘Female‘ with ‘F‘ using the Python Numpy where() method.
import pandas as pd
from numpy import where
df = pd.read_csv(
'../../Datasets/employee_dataset.csv'
)
df['emp_gender'] = where(df['emp_gender'] == 'Male', 'M', 'F')
print(df)
Replace Column Values in Pandas DataFrame using the mask() Method
mask() method is another Pandas DataFrame method that is used to replace the existing value with a new value in the Pandas DataFrame according to the condition. The mask() method takes condition as the first parameter and value as the second parameter to be replaced if the defined condition evaluates True.
Let’s see an example of using the DataFrame mask() method in order to replace the value in a specific column of Pandas DataFrame.
import pandas as pd
df = pd.read_csv(
'../../Datasets/employee_dataset.csv'
)
df['emp_gender'].mask(df['emp_gender'] == 'Male', 'M', inplace=True)
df['emp_gender'].mask(df['emp_gender'] == 'Female', 'F', inplace=True)
print(df)
Conclusion
So throughout this article, we have seen various ways to replace column values in Pandas DataFrame based on the condition now you have multiple ways to tackle this problem during the interviews and your Python Pandas application. You can go with anyone if you have a requirement to replace specific values to the specific column of the pandas DatFrame.
If you found this article helpful, please share and keep visiting for further Pandas tutorials.
Thanks for your valuable time…