In this article, you will learn about Python regular expression using the Python re module, Python RegEx provides regular expression matching operations. If you want to search for a specific string pattern from lots of strings, then you can use the re module.
Python re module is a built module that means you don’t need to download or install using pip, like requests module. When you installed Python on your machine, the Python Re module was installed along with.
In this guide, we will learn what is Regular Expressions (RegEx) and how to use using re built-in module. If you want to work with Regular Expressions then the re module is best for you.
Before start re module let’s explore a little about regular expression.
Headings of Contents
What is Regular Expressions ( RegEx )?
Regular Expressions or RegEx is a sequence of characters that form a search pattern.
RegEx is used to check if a string contains a specific search pattern.
Advantages of using RegEx
There are lots of advantages available of using regular expression in Python.
- Wide range of usages possibility.You can create one regular expression pattern and use various place and validate various kind of the data.
- You can avoid use of if and else statement to validate pattern.
- High productivity with less effort.
- You can search any kind of string from the set of collections of the strings.
Import re module
To use the re
module, first of all, you have to import the re module using the import
keyword.
Regular Expressions ( RegEx ) in Python
Once you have the imported re
module, you are able to work with regular expressions.
Example
Search a string that starts with ‘The‘ and ends with ‘programming‘.
import re
txt = 'The Programming Funda is a best platform to learn programming'
result = re.search("^The.*programming$", txt)
print(result.string)
Output
The Programming Funda is a best platform to learn programming
Python RegEx Functions
Python re
the module provides some set of functions that are used to search a string for a match.
Functions | Description |
---|---|
findall() | Return a list containing all the matches. |
search() | Return a match object if any matches in the string. |
split() | Return a list where the string has split at each match. |
sub | Replace one or many matches with a string. |
Metacharacters
Metacharacters are symbols or characters that have special meaning in Python regular expressions.
[ ] Square Bracket
Square Bracket specifies a set of characters you want to match.
Here [ams] will match if the string you try to match contain any one of the characters a
,m
or s
.
Expression | String | Match |
---|---|---|
[abc] | abcd | 3 match at abc d |
[hello] | Coding | 1 Match |
Example
import re
txt = 'Programmig Funda'
result1 = re.findall("[ams]", txt)
result2 = re.findall("[a-s]", txt)
print(result1)
print(result2)
Output
['a', 'm', 'm', 'a']
['r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'g', 'n', 'd', 'a']
You can also specify a range of characters using – inside the square bracket.
- [a-e] is same as [abcde]
- [1-5] is same as [12345]
- [0-45] is same as [012345]
\ Backslash
The backslash is used to escape various characters including all the metacharacters.
for example \d search only digit.
Expression | String | Match |
---|---|---|
. | a | 1 match |
.. | abc | 2 match |
… | abcd | 3 Match |
Example
import re
txt = 'Programmig Funda'
result = re.findall("Fu..a", txt)
print(result)
The output will be:- ['Funda']
^ Caret
The Caret symbol is used to check if the string starts with a special character.
Expression | String | Match |
---|---|---|
^a | a | 1 Match |
^abc | abcd | 1 Match |
^c | Programming | No Match |
Example
import re
txt = 'Programmig Funda'
result = re.findall("^Programmig", txt)
print(result)
The output will be:- [‘Programmig’]
$ Doller
The Doller symbol is used to check if the string end with special character.
Expression | String | Match |
---|---|---|
a$ | a | 1 Match |
n$ | Python | 1 Match |
a$ | Coding | No Match |
Example
import re
txt = 'Programmig Funda'
result = re.findall("Funda$", txt)
print(result)
Output
['Funda']
Star ( * )
The star symbol ( * ) matches zero or more occurrences.
Expression | String | Match |
---|---|---|
Ra*m | Rm | 1 Match |
“ | Ram | 1 Match |
“ | Raim | No Match because a is not followed by m |
Example
import re
txt = 'Programming'
result = re.search("Program*ing", txt)
print(result.string)
The Output will be:- Programming
Plus ( + )
The Plus symbol ( + ) matches one or more occurrences.
Expression | String | Match |
---|---|---|
Ra+im | Rm | No Match |
“ | Ram | 1 Match |
“ | Raaim | 1 Match |
Example
import re
txt = 'Raim'
result = re.findall("Ra+im", txt)
print(result)
The output will be:- [‘Raim’]
? Question Mark
The question mark match zero or one occurrence of the pattern left to it.
Expression | String | Example |
---|---|---|
Ra?m | Rm | 1 Match |
“ | Ram | 1 Match |
“ | Raam | No Match because a is not followed by m. |
Example
import re
txt = 'Programming'
result = re.findall("Programmin?g", txt)
print(result)
Output
['Programmig']
{ } Braces
The braces { } match exactly the specified number of occurrences.For example. {n, m}
this means at least n
and at most m
repetitions.
Expression | String | Match |
---|---|---|
a{2, 3} | abc, bcd | No Match |
a{2, 3} | aabc baad | 2 Match at (aabc and baad) |
a{2, 3} | abc baad | 1 Match at ( baad ) |
Let’s understand some more complex expressions to match strings. This Python RegEx [0-9]{2, 5}
matches at least 2
digits but not more than 5
digits.
Expression | String | Match |
---|---|---|
[0-9]{2, 5} | abcd123efgh | 1 Match at abcd123efgh |
Expression
import re
txt = 'Hello Helllo'
result = re.findall("l{2}", txt)
result2 = re.findall("l{2,3}", txt)
print(result)
print(result2)
Output
['ll', 'll']
['ll', 'lll']
| Alternation
The Vertical Operator ( | ) is used to search alternate occurrences.
Expression | String | Match |
---|---|---|
a|c | abd | 1 Match |
a|c | abc | 2 Match |
a|c | afg | No Match |
Example
import re
txt = 'hello'
result = re.findall("h|s", txt)
print(result)
The output will be:- [‘h’]
() Group
The Parentheses ()
is used to capture the group of sub-patterns. For example (x|y|z)abc
pattern any string that contains either x
, y
and z
followed by the abc
.
Expression | String | Match |
---|---|---|
(x|y|z)abc | xy abc | No Match |
(x|y|z)abc | xyabc | 1 Match |
(x|y|z)abc | xabc yabc | 2 Match |
Example
import re
txt = 'hello'
result = re.findall("(h|s)ello", txt) # Match h
print(result)
The output will be:- [‘h’]
Special Sequences
The special sequence represents the basic predefined characters classes which, have special meaning. Each special sequence makes a specific common pattern and it is very easy to use.
Special Sequence | Description |
---|---|
\A | Matches if the specified characters start of the string |
\b | Matches if the specified characters are at the beginning or end of the string |
\B | It is the Opposite of the \b. It matches if the specified characters are not at the beginning and end of the string. |
\d | Match decimal digit. It is equivalent to [0-9] |
\D | Match if the string does not contain digits |
\s | Match if the string contains white space characters. |
\S | Match if the string does not contain white space characters. |
\w | Matches if the string contains any word of characters ( a-z, A-Z, 0-5, _ ). |
\W | Matches if the string does not contain any word of characters ( a-z, A-Z, 0-5, _) . |
\Z | Matches if the specified characters are at the end of a string. |
The findall() Function
The findall()
function containing a list containing matches.
Example: Extract ‘mm’ from a string.
import re
txt = 'Programming Funda'
result = re.findall('mm', txt)
print(result)
The Output will be:- [‘mm’]
If there are no matches found, an empty list will be returned.
Example
Import re
txt = 'Programming Funda'
result = re.findall('hello', txt)
print(result)
Example: Extract digits from a string.
import re
txt = 'Programming Funda 12 is the best tutorial site.1230'
result = re.findall("\d+", txt)
print(result)
The output will be:- [’12’, ‘1230’]
split() Function
The re.split()
method split the matches where there is a match and return a list of strings.
Example:
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.'
result = re.split("\d+", txt)
print(result)
Output
['Programming Funda ', ' is the best ', ' tutorial site.']
If the pattern is not found, re.split()
returns a list containing the original string.
You can pass maxsplit
parameter to re.split()
method.it a maximum number of split that will occur.
Example
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
result = re.split("\d+", txt, maxsplit = 2)
print(result)
Output
['Programming Funda ', ' is the best ', ' tutorial site.1230']
The sub() Function
The re.sub()
the method is used to replace the match with the text of your choice.
Syntax
re.sub(pattter, replace, string)
Example
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
# patter to search
pattern = "\s+"
# replace variable
replace = ''
result = re.sub(pattern, replace, txt)
print(result)
Output
ProgrammingFunda12isthebest23tutorialsite.1230
Note:- If the pattern is not found, re.sub()
the function will return the original string.
If the pattern is not found, re.sub()
return the original string. You can control the number of occurrences with the count
parameter in re.sub()
method.
Example
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
# pattern
pattern = "\d+"
# variable
replace = ''
result = re.sub(pattern, replace, txt, 2)
print(result)
Output
Programming Funda is the best tutorial site.1230
The subn() Function
The re.subn()
the method is similar to the re.sub()
method but it returns a tuple of two items containing the new string and number of substitution.
Example
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
# pattern
pattern = "\d+"
# variable
replace = ''
result = re.subn(pattern, replace, txt)
print(result)
Output
('Programming Funda is the best tutorial site.', 3)
The search() Function
The re.search()
function is used to search a string for a matches and return a math objects.
If more than one matches, Only first occurrence will be occurred.
Example
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
# pattern
pattern = "\d+"
result = re.search(pattern,txt)
print(result)
Output
<re.Match object; span=(18, 20), match='12'>
Match Object
You can get method and attribute of match object using dir()
function.
Example
import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
result = re.search('mm',txt)
print(dir(result))
Output
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']
match.group()
The group()
method returns the part of the string where there is a match.
Example
import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2}) ',txt)
print(result.group())
The Output will be:- 400 20
The group()
method return part of the string where there is a matches. You can get the part of string using following parenthesis.
Example
import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2}) ',txt)
print(result.group(1))
print(result.group(2))
print(result.group(1,2))
print(result.groups())
Output
400
20
('400', '20')
('400', '20')
match.start() and match.end() and match.span()
The match.start()
retun the index of start matching substring, similarly match.end()
return the index of end matchig substring.
Example
import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.start(), end=' ')
print(result.end())
The Output will be:- 2 8
The span()
function returns a tuple containing start and end index of the matched part.
Example
import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.span())
The output will be:- (2, 8)
match.re and match.string
The match.re attribute return a regular expression, similarly match.string
return passing string.
Example
import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.re)
print(result.string)
Output
re.compile('(\\d{3}) (\\d{2})')
39801 400 20 1111
Conclusion
So in this article, you have learn all about Python regular expression with the help of the exmaples.re module in Python is going to be very helpful when you want to search more complex pattern string.
I will highly recommend you, If you are Python developer then you should have knowledge of regular expression because it is going to be very helpful in your regular Python journy and also in your real time Python project.
I hope this article will have helpf you, please share with someone which want to learn regular expression in Python.
If you like this artile, please share and keep visit for further Python tutorials.
Reference:- Click Here
~ Thanks for reading