Regular expressions (regex) in Python are powerful for searching, matching, and manipulating text. The re
module in Python provides functions to work with regex efficiently.
1. Importing the re
Module
The re
module contains all the necessary functions for regex operations.
import re
2. Basic Regex Functions in Python
Function | Description |
---|---|
re.search() | Finds the first occurrence of a pattern. |
re.match() | Matches a pattern only at the start of a string. |
re.findall() | Finds all occurrences of a pattern. |
re.finditer() | Returns an iterator with match objects. |
re.sub() | Replaces occurrences of a pattern. |
re.split() | Splits a string based on a pattern. |
re.compile() | Compiles a regex pattern for reuse. |
3. Simple Regex Matching
Using re.search()
Finds the first match in a string.
import re
text = "Hello, my email is example@email.com"
match = re.search(r"\w+@\w+\.\w+", text)
if match:
print("Found:", match.group()) # Output: example@email.com
🔹 \w+
→ Matches words (letters, numbers, and underscores).
🔹 @
→ Matches the @
symbol.
🔹 \.
→ Matches the .
(dot).
4. re.match()
vs re.search()
re.match()
checks only the beginning of the string.re.search()
searches anywhere in the string.
text = "123 Python is powerful"
print(re.match(r"\d+", text)) # Matches at start (Output: 123)
print(re.search(r"Python", text)) # Finds 'Python' anywhere
5. Finding All Matches with re.findall()
text = "Emails: alice@gmail.com, bob@yahoo.com, charlie@outlook.com"
emails = re.findall(r"\w+@\w+\.\w+", text)
print(emails) # ['alice@gmail.com', 'bob@yahoo.com', 'charlie@outlook.com']
re.findall()
returns a list of all matches.
6. Using re.finditer()
for More Control
Returns an iterator with match objects.
text = "The price is $100 and the discount is $20."
matches = re.finditer(r"\$\d+", text)
for match in matches:
print(match.group()) # Output: $100, $20
7. Replacing Text with re.sub()
text = "I love Java. Java is great!"
updated_text = re.sub(r"Java", "Python", text)
print(updated_text) # Output: I love Python. Python is great!
🔹 Replaces all occurrences of "Java"
with "Python"
.
Limiting replacements:
text = "Java Java Java"
print(re.sub(r"Java", "Python", text, count=2))
# Output: Python Python Java
8. Splitting Strings with re.split()
text = "apple, banana; mango|grapes"
words = re.split(r",|;|\|", text)
print(words) # ['apple', 'banana', 'mango', 'grapes']
,|;|\|
→ Matches ,
, ;
, or |
as separators.
9. Using re.compile()
for Reusability
Pre-compiling a regex pattern improves efficiency.
pattern = re.compile(r"\d{3}-\d{2}-\d{4}") # Matches SSN format
text = "My SSN is 123-45-6789."
match = pattern.search(text)
print(match.group()) # Output: 123-45-6789
\d{3}-\d{2}-\d{4}
→ Matches a pattern like 123-45-6789
.
10. Special Regex Characters and Their Meanings
Symbol | Meaning | Example |
---|---|---|
. | Matches any character (except newline) | a.b matches acb , a8b |
\d | Matches digits (0-9) | \d+ matches 123 in abc123xyz |
\w | Matches letters, digits, underscore | \w+ matches Python3 |
\s | Matches whitespace (space, tab, newline) | \s+ matches spaces |
^ | Matches the start of a string | ^Hello matches "Hello world" |
$ | Matches the end of a string | world$ matches "Hello world" |
[] | Matches any one character inside the brackets | [aeiou] matches vowels |
[^] | Matches any character not inside the brackets | [^0-9] matches non-digits |
* | Matches 0 or more times | go* matches g , gooo |
+ | Matches 1 or more times | go+ matches go , gooo but not g |
? | Matches 0 or 1 time | colou?r matches color and colour |
{n} | Matches exactly n times | \d{3} matches 123 |
{n,} | Matches at least n times | \d{2,} matches 12 , 123 |
{n,m} | Matches between n and m times | \d{2,4} matches 12 , 1234 |
` | ` | OR operator |
() | Groups expressions | (ab)+ matches "ababab" |
11. Using Regex to Validate User Input
Example: Validate an email address.
import re
def validate_email(email):
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
return bool(re.match(pattern, email))
print(validate_email("test@example.com")) # True
print(validate_email("invalid-email")) # False
12. Extracting Information Using Regex Groups
Using ()
to capture specific parts of a match.
text = "Phone: (123) 456-7890"
match = re.search(r"\((\d{3})\) (\d{3})-(\d{4})", text)
if match:
print("Area Code:", match.group(1)) # 123
print("First Part:", match.group(2)) # 456
print("Second Part:", match.group(3)) # 7890
13. Handling Case-Insensitive Matching
Use re.IGNORECASE
(re.I
) to ignore case.
text = "Python is amazing!"
print(re.search(r"python", text, re.I)) # Matches 'Python'