re：正则表达式库

2025-02-17

正则表达式（Regular Expression）是一种强大的工具，用于匹配和处理字符串。Python 提供了 re 模块来支持正则表达式的使用。本文将详细介绍 re 模块的功能和用法。

正则表达式的基本概念

正则表达式是一种模式，用于描述字符组合。它可以用于搜索、编辑或处理文本。常见的应用包括文本搜索、替换、验证输入等。

导入 `re` 模块

在使用正则表达式之前，需要导入 re 模块：

import re

常用函数

`re.search()`

在字符串中搜索正则表达式模式，返回第一个匹配对象。如果没有匹配，则返回 None。

import re

pattern = 'cat'
string = 'The cat is on the roof.'
match = re.search(pattern, string)
if match:
    print("Found:", match.group())

`re.findall()`

返回字符串中所有与正则表达式模式匹配的非重叠项。

matches = re.findall(r'\b\w{3}\b', 'The cat is on the roof.')
print(matches)  # 输出: ['The', 'cat', 'the']

`re.sub()`

替换字符串中与正则表达式模式匹配的部分。

result = re.sub(r'cat', 'dog', 'The cat is on the roof.')
print(result)  # 输出: 'The dog is on the roof.'

`re.split()`

根据正则表达式模式分割字符串。

parts = re.split(r'\s+', 'The cat is on the roof.')
print(parts)  # 输出: ['The', 'cat', 'is', 'on', 'the', 'roof.']

正则表达式语法

字符匹配

. 匹配除换行符以外的任意字符。
\d 匹配任何数字，等价于 [0-9]。
\D 匹配任何非数字字符。
\s 匹配任何空白字符，包括空格、制表符、换页符等。
\S 匹配任何非空白字符。
\w 匹配任何字母数字字符，包括下划线，等价于 [a-zA-Z0-9_]。
\W 匹配任何非字母数字字符。

边界匹配

^ 匹配字符串的开头。
$ 匹配字符串的结尾。
\b 匹配一个单词边界。
\B 匹配非单词边界。

量词

* 匹配前面的子表达式零次或多次。
+ 匹配前面的子表达式一次或多次。
? 匹配前面的子表达式零次或一次。
{n} 匹配前面的子表达式恰好 n 次。
{n,} 匹配前面的子表达式至少 n 次。
{n,m} 匹配前面的子表达式至少 n 次，至多 m 次。

分组和引用

() 用于分组。
(?P<name>...) 命名分组。
\1, \2, ... 引用分组。

示例代码

以下是一个使用正则表达式的示例，演示如何提取电子邮件地址：

import re

text = "Please contact us at support@example.com for further information."
pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
emails = re.findall(pattern, text)
print("Emails found:", emails)

版权所有

本文链接：/python/ryr1mpr5/

许可证：署名 4.0 国际 (CC-BY-4.0)

function

library

statement

chapter-01：Django框架认识

chapter-02：开发环境配置

chapter-03：项目框架搭建

chapter-04：ORM应用与原理剖析

chapter-05：Django管理后台

chapter-06：视图

chapter-07：模板系统

chapter-10：Django路由系统

chapter-08：表单系统

chapter-09：用户认证系统

chapter-11：Django中间件