Python Regular Expression ( With Examples )

In this article, you will learn about Python regular expression using the Python re module, Python RegEx provides regular expression matching operations. If you want to search for a specific string pattern from lots of strings, then you can use the re module.

Python re module is a built module that means you don’t need to download or install using pip, like requests module. When you installed Python on your machine, the Python Re module was installed along with.

In this guide, we will learn what is Regular Expressions (RegEx) and how to use using re built-in module. If you want to work with Regular Expressions then the re module is best for you.

Before start re module let’s explore a little about regular expression.

Headings of Contents

1 What is Regular Expressions ( RegEx )?
2 Advantages of using RegEx
- 2.1 Import re module
- 2.2 Regular Expressions ( RegEx ) in Python
3 Python RegEx Functions
4 Metacharacters
5 Special Sequences
6 The findall() Function
7 split() Function
8 Conclusion

What is Regular Expressions ( RegEx )?

Regular Expressions or RegEx is a sequence of characters that form a search pattern.
RegEx is used to check if a string contains a specific search pattern.

Advantages of using RegEx

There are lots of advantages available of using regular expression in Python.

Wide range of usages possibility.You can create one regular expression pattern and use various place and validate various kind of the data.
You can avoid use of if and else statement to validate pattern.
High productivity with less effort.
You can search any kind of string from the set of collections of the strings.

Import re module

To use the re module, first of all, you have to import the re module using the import keyword.

Regular Expressions ( RegEx ) in Python

Once you have the imported re module, you are able to work with regular expressions.

Example

Search a string that starts with ‘The‘ and ends with ‘programming‘.


import re
txt = 'The Programming Funda is a best platform to learn programming'
result = re.search("^The.*programming$", txt)
print(result.string)

Output

The Programming Funda is a best platform to learn programming

Python RegEx Functions

Python re the module provides some set of functions that are used to search a string for a match.

Functions	Description
findall()	Return a list containing all the matches.
search()	Return a match object if any matches in the string.
split()	Return a list where the string has split at each match.
sub	Replace one or many matches with a string.

Metacharacters

Metacharacters are symbols or characters that have special meaning in Python regular expressions.

[ ] Square Bracket

Square Bracket specifies a set of characters you want to match.

Here [ams] will match if the string you try to match contain any one of the characters a,m or s.

Expression	String	Match
[abc]	abcd	3 match at `abc`d
[hello]	Coding	1 Match

Example


import re
txt = 'Programmig Funda'
result1 = re.findall("[ams]", txt)
result2 = re.findall("[a-s]", txt)
print(result1)
print(result2)

Output


['a', 'm', 'm', 'a']
['r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'g', 'n', 'd', 'a']

You can also specify a range of characters using – inside the square bracket.

[a-e] is same as [abcde]
[1-5] is same as [12345]
[0-45] is same as [012345]

\ Backslash

The backslash is used to escape various characters including all the metacharacters.
for example \d search only digit.

Expression	String	Match
.	a	1 match
..	abc	2 match
…	abcd	3 Match

Example


import re
txt = 'Programmig Funda'
result = re.findall("Fu..a", txt)
print(result)

The output will be:- ['Funda']

^ Caret

The Caret symbol is used to check if the string starts with a special character.

Expression	String	Match
^a	a	1 Match
^abc	abcd	1 Match
^c	Programming	No Match

Example


import re
txt = 'Programmig Funda'
result = re.findall("^Programmig", txt)
print(result)

The output will be:- [‘Programmig’]

$ Doller

The Doller symbol is used to check if the string end with special character.

Expression	String	Match
a$	a	1 Match
n$	Python	1 Match
a$	Coding	No Match

Example


import re
txt = 'Programmig Funda'
result = re.findall("Funda$", txt)
print(result)

Output

['Funda']

Star ( * )

The star symbol ( * ) matches zero or more occurrences.

Expression	String	Match
Ra*m	Rm	1 Match
“	Ram	1 Match
“	Raim	No Match because a is not followed by m

Example


import re
txt = 'Programming'
result = re.search("Program*ing", txt)
print(result.string)

The Output will be:- Programming

Plus ( + )

The Plus symbol ( + ) matches one or more occurrences.

Expression	String	Match
Ra+im	Rm	No Match
“	Ram	1 Match
“	Raaim	1 Match

Example


import re
txt = 'Raim'
result = re.findall("Ra+im", txt)
print(result)

The output will be:- [‘Raim’]

? Question Mark

The question mark match zero or one occurrence of the pattern left to it.

Expression	String	Example
Ra?m	Rm	1 Match
“	Ram	1 Match
“	Raam	No Match because a is not followed by m.

Example


import re
txt = 'Programming'
result = re.findall("Programmin?g", txt)
print(result)

Output

['Programmig']

{ } Braces

The braces { } match exactly the specified number of occurrences.For example. {n, m} this means at least n and at most m repetitions.

Expression	String	Match
a{2, 3}	abc, bcd	No Match
a{2, 3}	aabc baad	2 Match at (aabc and baad)
a{2, 3}	abc baad	1 Match at ( baad )

Let’s understand some more complex expressions to match strings. This Python RegEx [0-9]{2, 5} matches at least 2 digits but not more than 5 digits.

Expression	String	Match
[0-9]{2, 5}	abcd123efgh	1 Match at abcd123efgh

Expression


import re
txt = 'Hello Helllo'
result = re.findall("l{2}", txt)
result2 = re.findall("l{2,3}", txt)
print(result)
print(result2)

Output


['ll', 'll']
['ll', 'lll']

| Alternation

The Vertical Operator ( | ) is used to search alternate occurrences.

Expression	String	Match
a\|c	abd	1 Match
a\|c	abc	2 Match
a\|c	afg	No Match

Example


import re
txt = 'hello'
result = re.findall("h|s", txt)
print(result)

The output will be:- [‘h’]

() Group

The Parentheses () is used to capture the group of sub-patterns. For example (x|y|z)abc pattern any string that contains either x, y and z followed by the abc.

Expression	String	Match
(x\|y\|z)abc	xy abc	No Match
(x\|y\|z)abc	xyabc	1 Match
(x\|y\|z)abc	xabc yabc	2 Match

Example


import re
txt = 'hello'
result = re.findall("(h|s)ello", txt) # Match h
print(result)

The output will be:- [‘h’]

Special Sequences

The special sequence represents the basic predefined characters classes which, have special meaning. Each special sequence makes a specific common pattern and it is very easy to use.

Special Sequence	Description
\A	Matches if the specified characters start of the string
\b	Matches if the specified characters are at the beginning or end of the string
\B	It is the Opposite of the \b. It matches if the specified characters are not at the beginning and end of the string.
\d	Match decimal digit. It is equivalent to `[0-9]`
\D	Match if the string does not contain digits
\s	Match if the string contains white space characters.
\S	Match if the string does not contain white space characters.
\w	Matches if the string contains any word of characters `( a-z, A-Z, 0-5, _`).
\W	Matches if the string does not contain any word of characters `( a-z, A-Z, 0-5, _)`.
\Z	Matches if the specified characters are at the end of a string.

The findall() Function

The findall() function containing a list containing matches.

Example: Extract ‘mm’ from a string.


import re
txt = 'Programming Funda'
result = re.findall('mm', txt)
print(result)

The Output will be:- [‘mm’]

If there are no matches found, an empty list will be returned.

Example


Import re
txt = 'Programming Funda'
result = re.findall('hello', txt)
print(result)

Example: Extract digits from a string.


import re
txt = 'Programming Funda 12 is the best tutorial site.1230'
result = re.findall("\d+", txt)
print(result)

The output will be:- [’12’, ‘1230’]

split() Function

The re.split() method split the matches where there is a match and return a list of strings.

Example:


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.'
result = re.split("\d+", txt)
print(result)

Output

['Programming Funda ', ' is the best ', ' tutorial site.']

If the pattern is not found, re.split() returns a list containing the original string.

You can pass maxsplit parameter to re.split() method.it a maximum number of split that will occur.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
result = re.split("\d+", txt, maxsplit = 2)
print(result)

Output

['Programming Funda ', ' is the best ', ' tutorial site.1230']

The sub() Function

The re.sub() the method is used to replace the match with the text of your choice.

Syntax

re.sub(pattter, replace, string)

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# patter to search
pattern = "\s+"

# replace variable
replace = ''

result = re.sub(pattern, replace, txt)
print(result)

Output

ProgrammingFunda12isthebest23tutorialsite.1230

Note:- If the pattern is not found, re.sub() the function will return the original string.

If the pattern is not found, re.sub() return the original string. You can control the number of occurrences with the count parameter in re.sub() method.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# pattern 
pattern = "\d+"

# variable
replace = ''


result = re.sub(pattern, replace, txt, 2)
print(result)

Output

Programming Funda  is the best  tutorial site.1230

The subn() Function

The re.subn() the method is similar to the re.sub() method but it returns a tuple of two items containing the new string and number of substitution.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# pattern 
pattern = "\d+"

# variable
replace = ''


result = re.subn(pattern, replace, txt)
print(result)

Output

('Programming Funda  is the best  tutorial site.', 3)

The search() Function

The re.search() function is used to search a string for a matches and return a math objects.
If more than one matches, Only first occurrence will be occurred.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# pattern 
pattern = "\d+"

result = re.search(pattern,txt)
print(result)

Output

<re.Match object; span=(18, 20), match='12'>

Match Object

You can get method and attribute of match object using dir() function.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
result = re.search('mm',txt)
print(dir(result))

Output


['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']

match.group()

The group() method returns the part of the string where there is a match.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2}) ',txt)
print(result.group())

The Output will be:- 400 20

The group() method return part of the string where there is a matches. You can get the part of string using following parenthesis.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2}) ',txt)
print(result.group(1))
print(result.group(2))
print(result.group(1,2))
print(result.groups())

Output


400
20
('400', '20')
('400', '20')

match.start() and match.end() and match.span()

The match.start() retun the index of start matching substring, similarly match.end() return the index of end matchig substring.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.start(), end=' ')
print(result.end())

The Output will be:- 2 8

The span() function returns a tuple containing start and end index of the matched part.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.span())

The output will be:- (2, 8)

match.re and match.string

The match.re attribute return a regular expression, similarly match.string return passing string.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.re)
print(result.string)

Output


re.compile('(\\d{3}) (\\d{2})')
39801 400 20 1111

Conclusion

So in this article, you have learn all about Python regular expression with the help of the exmaples.re module in Python is going to be very helpful when you want to search more complex pattern string.

I will highly recommend you, If you are Python developer then you should have knowledge of regular expression because it is going to be very helpful in your regular Python journy and also in your real time Python project.

I hope this article will have helpf you, please share with someone which want to learn regular expression in Python.

If you like this artile, please share and keep visit for further Python tutorials.

Reference:- Click Here

~ Thanks for reading

◀Context Manager in Python

Python Tutorial For Beginners 2024▶

Python Regular Expression ( With Examples )

What is Regular Expressions ( RegEx )?

Advantages of using RegEx

Import re module

Regular Expressions ( RegEx ) in Python

Python RegEx Functions

Metacharacters

Special Sequences

The findall() Function

split() Function

The sub() Function

The subn() Function

The search() Function

Match Object

match.group()

match.start() and match.end() and match.span()

match.re and match.string

Conclusion

About the Author: Admin

Tutorials

Data Analytics

Useful Links

Python Regular Expression ( With Examples )

What is Regular Expressions ( RegEx )?

Advantages of using RegEx

Import re module

Regular Expressions ( RegEx ) in Python

Python RegEx Functions

Metacharacters

Special Sequences

The findall() Function

split() Function

The sub() Function

The subn() Function

The search() Function

Match Object

match.group()

match.start() and match.end() and match.span()

match.re and match.string

Conclusion

About the Author: Admin

Related Posts

How to Get Public IP Address Using Python

25+ Python While Loop Questions and Answers

Mastering Data Hiding in Python

Ultimate Python List Tutorial: Mastering Data Structures

Top 5 Ways to Get File Size in Python

Python PIP Tutorial with Examples

Tutorials

Data Analytics

Useful Links