In this PySpark article, we will see how to format a string in PySpark DataFrame using column values with the help of an example. PySpark provides a string function called format_string() that is used to format the sting with the help of PySpark DataFrame column values.
Note:- Remember, This question might be asked in PySpark interviews.
Headings of Contents
PySpark Sample DataFrame
To apply the format_string() function, we must have a PySpark DataFrame, therefore I have created a sample CSV file along with some records as you can see below.
employees.csv |
---|
emp_full_name,emp_email,emp_gender,emp_salary,emp_department,date_of_joining,age Mayank Kumar,[email protected],Male,25000,BPO,11/1/2023,25 Vishvajit Rao,[email protected],Male,40000,IT,11/2/2023,30 Harshita Mathur,[email protected],Female,20000,Sales,11/3/2023,23 Kavya Singh,[email protected],Female,20000,SEO,11/4/2023,24 Vishal Kumar,[email protected],Male,60000,IT,11/5/2023,28 Vaishali Mehta,[email protected],Female,35000,SEO,11/6/2023,27 Vaishali Mehta,[email protected],Female,35000,SEO,11/6/2023,25 James Bond,[email protected],Male,42000,IT,11/7/2023,23 Mariya Katherine,[email protected],Female,32000,Sales,11/8/2023,29 Mariya Katherine,[email protected],Female,40000,Sales,11/8/2023,31 Harshali Kumari,[email protected],Female,21000,BPO,11/9/2023,20 Vinay Singh,[email protected],Male,18000,BPO,11/10/2023,24 Vinay Mehra,[email protected],Male,45000,IT,11/11/2023,33 Akshara Singh,[email protected],Female,55000,IT,11/12/2023,30 |
Now, I have created a PySpark DataFrame from the CSV data with the help of the csv() method of the PySpark DataFrameReader class.
Example: Creating a DataFrame from a CSV file
from pyspark.sql import SparkSession # creating spark session spark = ( SparkSession.builder.master("local[*]") .appName("www.programmingfunda.com") .getOrCreate() ) # creating DataFrame df = spark.read.option("header", "true").csv("../Datasets/employees.csv") df.show(truncate=False)
After executing the above code a new PySpark DataFrame will be created as you can see below.
Now, I am about to create a new column that will store a formatted string like “My name is ABC and I am x years old, Thanks” where ABC will be replaced with the value of the emp_full_name column, and x will be replaced with the value of age column.
Format a String in PySpark DataFrame using Column Values
Before applying the format_string() function, Let’s see a little about this function and its parameters.
format_string():- It is the string function in PySpark DataFrame that is used to format the arguments in printf-style and return the result as a string.
👉 PySpark format_string() function:- Click Here
It takes two parameters:
- Format:- It will string that will contain embedded tags and be used as a result of column values.
- cols:- Column names or columns to be used in formatting.
Note:- To use format_string() function we have to import it from pyspark.sql.functions function.
Let’s apply the format_string() function to make a new column intro with the help of the other column values.
Example:- PySpark format_string function Example
from pyspark.sql import SparkSession from pyspark.sql.functions import format_string # creating spark session spark = ( SparkSession.builder.master("local[*]") .appName("www.programmingfunda.com") .getOrCreate() ) # creating DataFrame df = spark.read.option("header", "true").csv("../Datasets/employees.csv") new_df = df.withColumn("intro", format_string("My name is %s and I am %s years old, Thanks", df.emp_full_name, df.age)) new_df.show(truncate=False)
After executing the above code, The new DataFrame will be like this.
Conclusion
So in this article, we have seen all about how to format a string in PySpark DataFrame using column values with the help of the example. The format_string() function is a string function that is used to format the string with the help of PySpark DataFrame column values.
This question might be asked in most of the PySpark interviews. If you found this article helpful, please share and keep visiting for further PySpark tutorials.
Thanks for visiting ….