學習: 正則表達式（regex）以及 Python 的 re 模組

以下是影片中示範的 Python 正則表達式（regex）以及 Python 的 re 模組的使用範例與詳細解釋：

1. 匯入 `re` 模組

在 Python 中使用正則表達式時，我們需要首先匯入 re 模組。

import re

re 模組提供了多種方法來操作正則表達式。

2. 原始字符串 (Raw String)

在正則表達式中，某些字符如反斜線（\）有特殊含義。為了避免 Python 對反斜線進行處理，通常會使用原始字符串（Raw String），在字符串前加上 r，這樣 Python 就不會對反斜線進行特殊處理。

pattern = r"\d{3}"

這樣，反斜線不會被轉義，直接作為正則表達式的一部分。

3. `re.compile` 方法

re.compile() 用來將正則表達式編譯成一個模式對象，可以重複使用，避免每次都重新編譯正則表達式。

pattern = re.compile(r"\d{3}")
matches = pattern.findall("My phone number is 123456 and 789")
print(matches)  # ['123', '789']

這裡編譯的正則表達式是 \d{3}，用來匹配三個連續的數字。

4. 匹配字面字符

如果我們要匹配特定的文字，比如 "ABC"，可以使用以下方式：

pattern = re.compile(r"ABC")
matches = pattern.findall("ABC is the first match and ABC again")
print(matches)  # ['ABC', 'ABC']

這會找到所有出現的 "ABC" 字符串。

5. 匹配任何字符 (dot)

在正則表達式中，點號（.）匹配任何字符（除了換行符）。若要匹配字面上的點號，需要使用反斜線進行轉義。

pattern = re.compile(r"\.")
matches = pattern.findall("This is a sentence. This is another sentence.")
print(matches)  # ['.', '.']

這將匹配字符串中的點號。

6. 匹配單詞字符 (`\w`)

\w 匹配字母（大小寫）、數字和下劃線。

pattern = re.compile(r"\w")
matches = pattern.findall("Hello, world! 123")
print(matches)  # ['H', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd', '1', '2', '3']

這將匹配文本中的所有字母和數字字符。

7. 匹配邊界 (word boundary, `\b`)

\b 用來匹配單詞邊界，即單詞的開頭或結尾處。

pattern = re.compile(r"\bword\b")
matches = pattern.findall("word is here, but not sword or words.")
print(matches)  # ['word']

這將匹配單獨的單詞 "word"。

8. 匹配字符串的開始與結束 (`^` 和 `$`)

^ 匹配字符串的開頭，$ 匹配字符串的結尾。

pattern = re.compile(r"^Hello")
matches = pattern.findall("Hello there!")
print(matches)  # ['Hello']

pattern = re.compile(r"there!$")
matches = pattern.findall("Hello there!")
print(matches)  # ['there!']

這將分別匹配以 "Hello" 開頭的字符串和以 "there!" 結尾的字符串。

9. 數字匹配 (`\d`)

\d 用來匹配任何數字字符。

pattern = re.compile(r"\d")
matches = pattern.findall("I have 2 apples and 3 oranges.")
print(matches)  # ['2', '3']

這會匹配字符串中的所有數字。

10. 匹配特定數量的字符（量詞）

正則表達式中的量詞用來指定字符出現的次數。常見的量詞有：

* 匹配零次或多次。
+ 匹配一次或多次。
? 匹配零次或一次。
{n} 匹配確切的 n 次。

例如，匹配三個連續的數字：

pattern = re.compile(r"\d{3}")
matches = pattern.findall("My numbers are 123 and 456.")
print(matches)  # ['123', '456']

11. 字符集 (Character Set)

字符集用方括號（[]）表示，能匹配方括號內的任意字符。例如，[abc] 會匹配 "a"、"b" 或 "c"。

pattern = re.compile(r"[aeiou]")
matches = pattern.findall("Hello World!")
print(matches)  # ['e', 'o', 'o']

這會匹配字符串中的所有元音字母。

12. 排除字符集 (Negated Character Set)

排除字符集會匹配不在方括號內的字符。以 [^] 來表示排除的字符集。

pattern = re.compile(r"[^aeiou]")
matches = pattern.findall("Hello World!")
print(matches)  # ['H', 'l', 'l', ' ', 'W', 'r', 'l', 'd', '!']

這會匹配所有非元音字母的字符。

13. 捕獲組 (Capturing Groups)

捕獲組是正則表達式中的一部分，可以使用括號來定義。這樣可以提取並操作匹配的文本。

pattern = re.compile(r"(\d{3})-(\d{2})-(\d{4})")
matches = pattern.findall("My number is 123-45-6789.")
print(matches)  # [('123', '45', '6789')]

這裡正則表達式會捕獲電話號碼的三個部分。

14. 替換文本 (Substitution)

re.sub() 用來替換匹配到的文本。可以用捕獲組來進行替換。

pattern = re.compile(r"(\d{3})-(\d{2})-(\d{4})")
result = pattern.sub(r"\1-\2-****", "My number is 123-45-6789.")
print(result)  # My number is 123-45-****

這將用 **** 替換電話號碼的最後四位。

15. 使用文件內容進行正則匹配

你也可以將正則表達式應用於文本文件，進行匹配操作。

with open('data.txt', 'r') as file:
    content = file.read()
    matches = re.findall(r"\d{3}-\d{3}-\d{4}", content)
    print(matches)

這將搜尋並列出文件中的所有符合電話號碼格式的匹配項。

小結：

這些是 Python 中 re 模組的一些常見用法，包括如何編寫正則表達式來匹配、捕獲和替換文本。學會使用這些技巧，可以大大提高處理文本數據的效率，特別是當需要進行複雜的查找和替換時。

學習

2025年2月15日星期六

正則表達式（regex）以及 Python 的 re 模組

1. 匯入 `re` 模組

2. 原始字符串 (Raw String)

3. `re.compile` 方法

4. 匹配字面字符

5. 匹配任何字符 (dot)

6. 匹配單詞字符 (`\w`)

7. 匹配邊界 (word boundary, `\b`)

8. 匹配字符串的開始與結束 (`^` 和 `$`)

9. 數字匹配 (`\d`)

10. 匹配特定數量的字符（量詞）

11. 字符集 (Character Set)

12. 排除字符集 (Negated Character Set)

13. 捕獲組 (Capturing Groups)

14. 替換文本 (Substitution)

15. 使用文件內容進行正則匹配

小結：

沒有留言:

張貼留言

精選文章

手機使用VPN進入NAS、DaikinAPP時，仍能使用其他APP如銀行，連上WIFI 或行動網路

搜尋此網誌

2025年2月15日 星期六

正則表達式（regex）以及 Python 的 re 模組

1. 匯入 re 模組

2. 原始字符串 (Raw String)

3. re.compile 方法

4. 匹配字面字符

5. 匹配任何字符 (dot)

6. 匹配單詞字符 (\w)

7. 匹配邊界 (word boundary, \b)

8. 匹配字符串的開始與結束 (^ 和 $)

9. 數字匹配 (\d)

10. 匹配特定數量的字符（量詞）

11. 字符集 (Character Set)

12. 排除字符集 (Negated Character Set)

13. 捕獲組 (Capturing Groups)

14. 替換文本 (Substitution)

15. 使用文件內容進行正則匹配

小結：

沒有留言:

張貼留言

精選文章

手機使用VPN進入NAS、DaikinAPP時，仍能使用其他APP如銀行，連上WIFI 或行動網路

2025年2月15日星期六

1. 匯入 `re` 模組

3. `re.compile` 方法

6. 匹配單詞字符 (`\w`)

7. 匹配邊界 (word boundary, `\b`)

8. 匹配字符串的開始與結束 (`^` 和 `$`)

9. 數字匹配 (`\d`)