博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
c++中regex
阅读量:6671 次
发布时间:2019-06-25

本文共 7435 字,大约阅读时间需要 24 分钟。

   http://blog.csdn.net/mycwq/article/details/18838151#comments

版权声明:本文为“没有开花的树”原创文章,未经博主允许不得转载。

在c++中,有三种正则可以选择使用,C ++regex,C regex,boost regex ,如果在windows下开发c++,默认不支持后面两种正则,如果想快速应用,显然C++ regex 比较方便使用。文章将讨论C++ regex 正则表达式的使用。

C++ regex函数有3个:regex_match、 regex_search 、regex_replace

regex_match

regex_match是正则表达式匹配的函数,下面以例子说明。如果想系统的了解,参考

[cpp]   

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>                                                             
#include <regex>
#include <string>
int 
main(
void
){
    
if
(std::regex_match(
"subject"
,std::regex(
"(sub).(.*)"
))){
        
std::cout << 
"string literal matched\n"
;
    
}
    
std::string s(
"subject"
);
    
std::regex e(
"(sub)(.*)"
);
    
if
(std::regex_match(s,e)){
        
std::cout << 
"string literal matched\n"
;
    
}
    
std::cmatch cm;
    
std::regex_match(
"subject"
,cm,e);
    
std::cout << 
"string literal with" 
<< cm.size() << 
"matches\n"
;
    
std::smatch sm;
    
std::regex_match(s,sm,e);
    
std::cout << 
"string object with" 
<< sm.size() << 
" matcheds\n"
;
 
    
std::regex_match(s.cbegin(),s.cend(),sm,e);
    
std::cout << 
"range with" 
<< sm.size() << 
" matched\n"
;
    
std::regex_match(
"subject"
,cm,e,std::regex_constants::match_default);
    
std::cout << 
"the matches were:"
;
    
for
(unsigned i = 0;i<sm.size();++i){
        
std::cout << 
"[" 
<< sm.str() << 
"]"
;
    
}
    
std::cout << 
'\n'
;
    
for
(unsigned i = 0;i<sm.size();++i){
        
std::cout << 
"[" 
<< sm[i] << 
"]"
;
    
}
}

输出如下:

[plain]   

  1. string literal matched
  2. string literal matched

  3. string literal with3matches

  4. string object with3 matcheds

  5. range with3 matched

  6. the matches were:[subject][subject][subject]

  7. [subject][sub][ject]

regex_search

regex_match是另外一个正则表达式匹配的函数,下面是regex_search的例子。regex_search和regex_match的主要区别是:regex_match是全词匹配,而regex_search是搜索其中匹配的字符串。如果想系统了解,请参考

[cpp]   

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// regex_search example  
#include <iostream>  
#include <regex>  
#include <string>  
   
int 
main(){  
  
std::string s (
"this subject has a submarine as a subsequence"
);  
  
std::smatch m;  
  
std::regex e (
"\\b(sub)([^ ]*)"
);   
// matches words beginning by "sub"  
   
  
std::cout << 
"Target sequence: " 
<< s << std::endl;  
  
std::cout << 
"Regular expression: /\\b(sub)([^ ]*)/" 
<< std::endl;  
  
std::cout << 
"The following matches and submatches were found:" 
<< std::endl;  
   
  
while 
(std::regex_search (s,m,e)) {  
    
for 
(
auto 
x=m.begin();x!=m.end();x++)   
      
std::cout << x->str() << 
" "
;  
    
std::cout << 
"--> ([^ ]*) match " 
<< m.format(
"$2"
) <<std::endl;  
    
s = m.suffix().str();  
  
}  
}

输出如下:

[plain]   

  1. Target sequence: this subject has a submarine as a subsequence  

  2. Regular expression: /\b(sub)([^ ]*)/  

  3. The following matches and submatches were found:  

  4. subject sub ject --> ([^ ]*) match ject  

  5. submarine sub marine --> ([^ ]*) match marine  

  6. subsequence sub sequence --> ([^ ]*) match sequence  

[cpp]   

regex_replace

regex_replace是替换正则表达式匹配内容的函数,下面是regex_replace的例子。如果想系统了解,请参考

[cpp]   

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <regex>   
#include <iostream>   
    
int 
main() {   
    
char 
buf[20];   
    
const 
char 
*first = 
"axayaz"
;   
    
const 
char 
*last = first + 
strlen
(first);   
    
std::regex rx(
"a"
);   
    
std::string fmt(
"A"
);   
    
std::regex_constants::match_flag_type fonly =   
        
std::regex_constants::format_first_only;   
    
    
*std::regex_replace(&buf[0], first, last, rx, fmt) = 
'\0'
;   
    
std::cout << &buf[0] << std::endl;   
    
    
*std::regex_replace(&buf[0], first, last, rx, fmt, fonly) = 
'\0'
;   
    
std::cout << &buf[0] << std::endl;   
    
    
std::string str(
"adaeaf"
);   
    
std::cout << std::regex_replace(str, rx, fmt) << std::endl;   
    
    
std::cout << std::regex_replace(str, rx, fmt, fonly) << std::endl;   
    
    
return 
0;   
}

输出如下:

[plain]   

  1. AxAyAz  

  2. Axayaz  

  3. AdAeAf  

  4. Adaeaf  

C++ regex正则表达式的规则和其他编程语言差不多,如下:

特殊字符(用于匹配很难形容的字符):

characters description matches
. not newline any character except line terminators (LF, CR, LS, PS).
\t tab (HT) a horizontal tab character (same as \u0009).
\n newline (LF) a newline (line feed) character (same as \u000A).
\v vertical tab (VT) a vertical tab character (same as \u000B).
\f form feed (FF) a form feed character (same as \u000C).
\r carriage return (CR) a carriage return character (same as \u000D).
\cletter control code a control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
For example: \ca is the same as \u0001, \cb the same as \u0002, and so on...
\xhh ASCII character a character whose code unit value has an hex value equivalent to the two hex digits hh.
For example: \x4c is the same as L, or \x23 the same as #.
\uhhhh unicode character a character whose code unit value has an hex value equivalent to the four hex digitshhhh.
\0 null a null character (same as \u0000).
\int backreference the result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.
\d digit a decimal digit character 
\D not digit any character that is not a decimal digit character
\s whitespace a whitespace character 
\S not whitespace any character that is not a whitespace character
\w word an alphanumeric or underscore character 
\W not word any character that is not an alphanumeric or underscore character
\character character the character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
[class] character class the target character is part of the class 
[^class] negated character class the target character is not part of the class 

注意了,在C++反斜杠字符(\)会被转义

[cpp]   

  1. std::regex e1 ("\\d");  //  \d -> 匹配数字字符  

  2. std::regex e2 ("\\\\"); //  \\ -> 匹配反斜杠字符  

数量

characters times effects
* 0 or more The preceding atom is matched 0 or more times.
+ 1 or more The preceding atom is matched 1 or more times.
? 0 or 1 The preceding atom is optional (matched either 0 times or once).
{
int}
int The preceding atom is matched exactly int times.
{
int,}
int or more The preceding atom is matched int or more times.
{
min,max}
between min and max The preceding atom is matched at least min times, but not more than max.

注意了,模式 "(a+).*" 匹配 "aardvark" 将匹配到 aa,模式 "(a+?).*" 匹配 "aardvark" 将匹配到 a

(用以匹配连续的多个字符):

characters description effects
(subpattern) Group Creates a backreference.
(?:subpattern) Passive group Does not create a backreference.

注意了,第一种将创建一个反向引用,用于提取匹配到的内容,第二种则没有,相对来说性能方面也没这部分的开销

characters description condition for match
^ Beginning of line Either it is the beginning of the target sequence, or follows a line terminator.
$ End of line Either it is the end of the target sequence, or precedes a line terminator.
| Separator Separates two alternative patterns or subpatterns..

单个字符

[abc] 匹配 a, b 或 c.

[^xyz] 匹配任何非 x, y, z的字符

范围

[a-z] 匹配任何小写字母 (a, b, c, ..., z).
[abc1-5] 匹配 a, b , c, 或 1 到 5 的数字.

c++ regex还有一种类POSIX的写法

class description equivalent (with regex_traits, default locale)
[:alnum:] alpha-numerical character isalnum
[:alpha:] alphabetic character isalpha
[:blank:] blank character isblank
[:cntrl:] control character iscntrl
[:digit:] decimal digit character isdigit
[:graph:] character with graphical representation isgraph
[:lower:] lowercase letter islower
[:print:] printable character isprint
[:punct:] punctuation mark character ispunct
[:space:] whitespace character isspace
[:upper:] uppercase letter isupper
[:xdigit:] hexadecimal digit character isxdigit
[:d:] decimal digit character isdigit
[:w:] word character isalnum
[:s:] whitespace character isspace

                                       本文转自神ge 51CTO博客,原文链接:http://blog.51cto.com/12218412/1872058

你可能感兴趣的文章
13.2. 数字签名
查看>>
布局管理器之CardLayout(卡片布局管理器)
查看>>
两个js冲突怎么解决?试试这四个方法
查看>>
关于查询扩展版ESI高被引论文的说明
查看>>
167.5. libvirt
查看>>
HTTP 头部解释
查看>>
DataUtil
查看>>
129.3. RBridge
查看>>
Appium+python自动化9-SDK Manager
查看>>
RDLC系列之五 初试XAML
查看>>
Redis配置文件之————redis.conf配置及说明
查看>>
PHP Ajax JavaScript 实现 无刷新附件上传
查看>>
Git错误提示之:fatal: Not a git repository (or any of the parent directories): .git
查看>>
122.2. varnish utility
查看>>
在win7主机上为你的linux虚拟机配置ntp服务
查看>>
解析MYSQL BINLOG 二进制格式(2)--FORMAT_DESCRIPTION_EVENT
查看>>
Oracle 12c DBCA浅析(r12笔记第48天)
查看>>
MYSQL INNODB innodb_thread_concurrency相关参数理解
查看>>
SQL优化常用方法16
查看>>
Oracle并行操作——并行DML操作
查看>>