当前位置：主页 > 学无止境 > WEB前端 >

Vue React JavaScript Angular CSS HTML

ECMAScript 正则表达式越来越好！

作者：迹忆客最近更新：2023/01/08 浏览次数：

1999 年，ECMAScript 3 添加了对正则表达式的支持。

十六年后，ES6/ES2015 引入了 Unicode 模式（u 标志）、粘性模式（y 标志）和 RegExp.prototype.flags getter。

本文重点介绍了 JavaScript 正则表达式领域目前正在发生的事情。剧透：数量相当多——目前通过 TC39 标准化进程推进的与 RegExp 相关的提案比 ECMAScript 历史上对 RegExp 的更新还多！

我们将讨论以下 ES2018 特性和 ECMAScript 提议：

dotAll 模式（s 标志）

默认，. 匹配除行终止符之外的任何字符：

/foo.bar/u.test('foo\nbar');
// → false

（它也不匹配 astral Unicode 符号，但我们通过启用 u 标志修复了它。）

ES2018 引入了 dotAll 模式，通过 s 标志启用。在 dotAll 模式下，. 也匹配行终止符。

/foo.bar/su.test('foo\nbar');
// → true

回溯断言

环视是零宽度断言，它匹配一个字符串而不消耗任何东西。 ECMAScript 目前支持在前向执行此操作的先行断言。正向前瞻确保一个模式后跟另一个模式：

const pattern = /\d+(?= dollars)/u;
const result = pattern.exec('42 dollars');
// → result[0] === '42'

负向前瞻确保一个模式后面不跟另一个模式：

const pattern = /\d+(?! dollars)/u;
const result = pattern.exec('42 pesos');
// → result[0] === '42'

ES2018 添加了对 lookbehind 断言的支持。肯定后瞻确保一个模式之前有另一个模式

const pattern = /(?<=\$)\d+/u;
const result = pattern.exec('$42');
// → result[0] === '42'

否定后瞻确保一个模式之前没有另一个模式：

const pattern = /(?<!\$)\d+/u;
const result = pattern.exec('€42');
// → result[0] === '42'

关于前瞻和后瞻断言更详细的知识点可以参考前瞻断言和后瞻断言

命名捕获组

目前，正则表达式中的每个捕获组都有编号，可以使用该编号进行引用：

const pattern = /(\d{4})-(\d{2})-(\d{2})/u;
const result = pattern.exec('2017-01-25');
// → result[0] === '2017-01-25'
// → result[1] === '2017'
// → result[2] === '01'
// → result[3] === '25'

这很有用，但不是很可读或可维护。每当模式中捕获组的顺序发生变化时，索引都需要相应地更新。

ES2018 添加了对命名捕获组的支持，使代码更具可读性和可维护性。

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
const result = pattern.exec('2017-01-25');
// → result.groups.year === '2017'
// → result.groups.month === '01'
// → result.groups.day === '25'

Unicode 属性转义

Unicode 标准为每个符号分配了各种属性和属性值。例如，要获取希腊文字中使用的符号集，请在 Unicode 数据库中搜索 Script_Extensions 属性设置为 Greek 的符号。

Unicode 属性转义使得在 ECMAScript 正则表达式中访问这些 Unicode 字符属性成为可能。例如，模式 \p{Script_Extensions=Greek} 匹配希腊文字中使用的每个符号。

const regexGreekSymbol = /\p{Script_Extensions=Greek}/u;
regexGreekSymbol.test('π');
// → true

以前，希望在 JavaScript 中使用等效正则表达式的开发人员不得不求助于大型运行时依赖项或构建脚本，这两者都会导致性能和可维护性问题。通过对 Unicode 属性转义的内置支持，创建基于 Unicode 属性的正则表达式再简单不过了。

字符串的 Unicode 属性

一个单独的提案将 Unicode 属性转义功能扩展到扩展到字符序列的 Unicode 属性，例如 Basic_Emoji（包含所有表情符号，无论它们是由单个代码点还是一系列代码点组成）：

const regexBasicEmoji = /\p{Basic_Emoji}/v;

// Note: although 4️⃣ looks like a single symbol, it consists
// of two Unicode code points.
regexBasicEmoji.test('4️⃣');
// → true

// Flag emoji consist of multiple code points.
regexBasicEmoji.test('🇧🇪');
// → true

该提案将使使用正则表达式匹配表情符号（可以包含多个代码点）和主题标签（可以包含表情符号）变得更加容易。随着 Unicode 标准随着时间的推移定义了更多的序列属性，JavaScript 正则表达式也可以支持这些属性。

注意 ：此提案仍在标准化过程中，因此，其语法可能会发生变化。本文中的描述和代码示例与撰写本文时提案的最新版本相匹配。该提案目前处于第 3 阶段，最早可以纳入 ES2023。

设置符号

提议的 unicodeSets 模式，使用 v 标志启用，解锁了对扩展字符类的支持，不仅包括字符串的属性，还包括集合表示法、字符串文字语法和改进的不区分大小写的匹配。

集合表示法包括差分/减法的 -- 语法：

// Match all Greek symbols except for “π”:
/[\p{Script_Extensions=Greek}--π]/v.test('π'); // → false

// Match all Greek symbols except for “α”, “β”, and “γ”:
/[\p{Script_Extensions=Greek}--[αβγ]]/v.test('α'); // → false
/[\p{Script_Extensions=Greek}--[α-γ]]/v.test('β'); // → false

// Match all RGI emoji tag sequences except for the flag of Scotland:
/^[\p{RGI_Emoji_Tag_Sequence}--\q{🏴󠁧󠁢󠁳󠁣󠁴󠁿}]$/v.test('🏴󠁧󠁢󠁳󠁣󠁴󠁿'); // → false

使用新的 && 语法完成交集：

// Match all Greek letters:
const re = /[\p{Script_Extensions=Greek}&&\p{Letter}]/v;
// U+03C0 GREEK SMALL LETTER PI
re.test('π'); // → true
// U+1018A GREEK ZERO SIGN
re.test('𐆊'); // → false

String.prototype.matchAll

全局 (g) 或粘性 (y) 正则表达式的一个常见用例是将其应用于字符串并遍历所有匹配项，包括捕获组。 String.prototype.matchAll 提案使这比以往任何时候都容易。

const string = 'Magic hex numbers: DEADBEEF CAFE 8BADF00D';
const regex = /\b[0-9a-fA-F]+\b/g;
for (const match of string.matchAll(regex)) {
    console.log(match);
}

每个循环迭代的匹配对象等同于 regex.exec(string) 将返回的内容。

// Iteration 1:
[
    'DEADBEEF',
    index: 19,
    input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'
]

// Iteration 2:
[
    'CAFE',
    index: 28,
    input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'
]

// Iteration 3:
[
    '8BADF00D',
    index: 33,
    input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'
]

String.prototype.matchAll 对于带有捕获组的正则表达式特别有用：

const string = 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262';
const regex = /\b(?<owner>[a-z0-9]+)\/(?<repo>[a-z0-9\.]+)\b/g;

for (const match of string.matchAll(regex)) {
 console.log(`${match[0]} at ${match.index} with '${match.input}'`);
 console.log(`→ owner: ${match.groups.owner}`);
 console.log(`→ repo: ${match.groups.repo}`);
}

// Output:
//
// tc39/ecma262 at 23 with 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262'
// → owner: tc39
// → repo: ecma262
// v8/v8.dev at 36 with 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262'
// → owner: v8
// → repo: v8.dev
// tc39/test262 at 46 with 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262'
// → owner: tc39
// → repo: test262

旧版 RegExp 功能

另一个提案指定了某些遗留 RegExp 功能，例如 RegExp.prototype.compile 方法和从 RegExp.$1 到 RegExp.$9 的静态属性。尽管这些功能已被弃用，但不幸的是，如果不引入兼容性问题，就无法将它们从 Web 平台中删除。因此，标准化他们的行为并让引擎调整他们的实现是最好的前进方式。此提案对于 Web 兼容性很重要。

上一篇：ES2015 中支持 Unicode 的正则表达式

下一篇：ES2015 const 与不变性无关

转载请发邮件至 1244347461@qq.com 进行申请，经作者同意之后，转载请以链接形式注明出处

本文地址：

迹忆客专注技术分享

ECMAScript 正则表达式越来越好！

dotAll 模式（s 标志）

回溯断言

命名捕获组

Unicode 属性转义

字符串的 Unicode 属性

设置符号

String.prototype.matchAll

旧版 RegExp 功能

相关文章

只允许使用字母数字字符的 JavaScript 正则表达式

JavaScript 中的电话号码正则表达式

Python 中不区分大小写的正则表达式

在 Java 中的 String.contains() 方法中使用正则表达式

Python - 匹配多行文本块的正则表达式

在 Python 中使用正则表达式捕获组

Python 正则表达式转义

在 Python 中使用 Re 模块的正则表达式通配符

在 Bash 中的 Case 语句中运行正则表达式

扫一扫阅读全部技术教程

社交账号

 https://www.github.com/onmpw

 qq:1244347461



最新推荐

教程更新

热门标签

ECMAScript 正则表达式越来越好！

dotAll 模式（s 标志）

回溯断言

命名捕获组

Unicode 属性转义

字符串的 Unicode 属性

设置符号

String.prototype.matchAll

旧版 RegExp 功能

相关文章

扫一扫阅读全部技术教程

社交账号  https://www.github.com/onmpw  qq:1244347461 

最新推荐

教程更新

热门标签

社交账号

 https://www.github.com/onmpw

 qq:1244347461

