Python is a high-level, interpreted programming language and is one of the most popular programming languages present in the software industry.
One of the most important concepts in python is that of RegEx. Let us have a look at it in detail.
In python, there might arise a need when you want to search a particular pattern of a string inside other strings to maybe extract more information from these strings.
We can use regular expressions, also known as RegEx, in such cases to match parts of strings.
To use regular expressions in Python, we use the re module.
For this, we first need to import it using the import keyword in the following manner:
import re
To aid in the working of regular expressions, python has provisions for expressions that are interpreted in a special manner.
Such expressions are made possible using the concept of metacharacters.
The metacharacters in python re are as follows:
Meta Character |
Description |
---|---|
[] |
Used to specify a set of characters |
\ |
Used to mention special sequences |
. |
Used to match a single character. The newline character \n is an exception |
^ |
Used to match the start of a string |
$ |
Used to match the end of a string |
* |
Used to match any number of occurrences |
+ |
Used to match at least one occurrence |
? |
Used to match at most one occurrence |
{} |
Used to match the number of occurrences specified |
| |
Used to match either of two expression |
() |
Used to group patterns |
There might be some expressions that occur more often than others and are very common.
To have a shorter and more efficient way of representing these expressions, there is provision for something called special sequences.
The special sequences in python re are as follows:
Special sequence |
Meaning |
---|---|
\A |
Matches the start of a string with the specified characters |
\b |
Matches the start or end of a string with the specified characters |
\B |
It acts as the opposite of \b |
\d |
Matches any digit from 0 to 9 |
\D |
It acts as the opposite of \d |
\s |
Matches whitespace characters |
\S |
It acts as the opposite of \s |
\w |
Matches alphabets, digits or _ |
\W |
It acts as the opposite of \w |
\z |
Matches the end of the string |
In python, the re module has provision for the usage of multiple functions. Let us take a look at them here. We’ll consider four functions in this case.
The first one is the search() function which is the most commonly used one.
This helps us in matching a part of a string with the desired substring.
It returns what is known as a match object.
An important thing to note is that if there are multiple matching strings, the search() returns only the one that was encountered first and a None value is returned in case no matches are found.
The second function is the split() function. As the name suggests, it is used to split a given string into multiple substrings.
This splitting is governed by a number of factors but the most common one is the matching of a regular expression.
The third function is the findall() function. As the name suggests, this function is used to find all the matches of a given regular expression instead of just the first occurrence.
The fourth function is the sub() function. As the name suggests, this function is used to find the matches of a regular expression in a given string and substitute or replace these matches with another substring.
Let us look at a piece of python code and its corresponding output for the usage of these functions:
import re
# Defining a string
a = 'Hello world'
# Using the search function returns the match object
ans = re.search('world', a)
# Printing the answer
print(ans)
# Defining a string
a = 'Hello world'
# Using the findall function
ans = re.findall('o', a)
# Printing the answer
print(ans)
# Defining a string
a = 'Hello world'
# Using the split function to split at all whitespaces
ans = re.split('\s', a)
# Printing the answer
print(ans)
# Defining a string
a = 'Hello world'
# Using the sub functionÂ
ans = re.sub('Hello', 'Bye', a)
# Printing the answer
print(ans)
Output
<re.Match object; span=(6, 11), match='world'>
['o', 'o']
['Hello', 'world']
Bye world
In this topic, we have learned the use and advantages of regular expressions, also called RegEx in a Python program along with some related functions, following some simple running examples, thus giving us an intuition of how this concept could be applied in the real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.