跳到主要内容

Regular Expression

Introduction

This document collects useful regular expressions that are helpful for document processing.

Basics

Single-Character Match

RegexRuleCan Match
AA specified characterA
\u548cA specified Unicode character
.Any charactera, b, &, 0
\dNumbers 0~90~9
\wLowercase and uppercase letters, numbers, and underscoresa~z, A~Z, 0~9, _
\sSpace, TabSpace, tab
\DNon-numbera, A, &, _, ……
\WNon \w&, @, 中, ……
\SNon \sa, A, &, _, ……

Multi-Character Match

RegexRuleCan Match
A*Any number of a characterEmpty, A, AA, AAA, ……
A+At least 1 characterA, AA, AAA, ……
A?0 or 1 characterEmpty, A
A{3}3 charactersAAA
A{2,3}2 to 3 charactersAA, AAA
A{2,}At least n characterAA, AAA, AAAA, ……
A{0,3}At most n characterEmpty, A, AA, AAA

Complex Match

RegexRuleCan Match
^Starts with^string
$Ends withstring$
[ABC]Any character in […]A, B, C
[A-F0-9xy]A character in the rangeA, ……, F, 0, ……, 9, x, y
[^a-f]Any character not in the rangeNon A ~ F
AB|CD|EFAB or CD or EFAB, CD, EF
(?!_)  Cannot start with _
(?!.*?_$)Cannot end with _

Use Cases

RegexCan Match
[\u4e00-\u9fa5]Chinese Character
^[\u4e00-\u9fa5_a-zA-Z0-9]+$At least 1 Chinese/English/numbers/_ character
[\u4e00-\u9fa5_a-zA-Z0-9]{4,10}4-10 Chinese/English/numbers/_ characters
RegexCan Match
\[[\s\S]*?\]\([\s\S]*?#[\s\S]*?\)Links that end with #heading
\[\w*.\w*\]\(#[\s\S]*?\)Internal links within the same document
(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]URLs

References