查看: 495|回复: 1

关于正则表达式---ZT [复制链接]

Longe

管理员

论坛测试[砖]家

Rank: 12

威望: 9084
在线时间: 1242 小时
金币: 6988
贡献: 300
存款: 1660001
最后登录: 2026-5-10
注册时间: 2006-5-10
帖子: 1841
精华: 6
积分: 15416
阅读权限: 200
UID: 10

电梯直达

1楼

发表于 2009-11-9 13:04:38 |只看该作者 |倒序浏览

第一部分：
-----------------
正则表达式(REs)通常被错误地认为是只有少数人理解的一种神秘语言。在表面上它们确实看起来杂乱无章，如果你不知道它的语法，那么它的代码在你眼里只是一堆文字垃圾而已。实际上，正则表达式是非常简单并且可以被理解。读完这篇文章后，你将会通晓正则表达式的通用语法。

支持多种平台

正则表达式最早是由数学家Stephen Kleene于1956年提出，他是在对自然语言的递增研究成果的基础上提出来的。具有完整语法的正则表达式使用在字符的格式匹配方面上，后来被应用到熔融信息技术领域。自从那时起，正则表达式经过几个时期的发展，现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。

正则表达式并非一门专用语言，但它可用于在一个文件或字符里查找和替代文本的一种标准。它具有两种标准：基本的正则表达式(BRE)，扩展的正则表达式(ERE)。ERE包括BRE功能和另外其它的概念。

许多程序中都使用了正则表达式，包括xsh,egrep,sed,vi以及在UNIX平台下的程序。它们可以被很多语言采纳，如HTML 和XML，这些采纳通常只是整个标准的一个子集。

比你想象的还要普通
随着正则表达式移植到交叉平台的程序语言的发展，这的功能也日益完整，使用也逐渐广泛。网络上的搜索引擎使用它，e-mail程序也使用它，即使你不是一个UNIX程序员，你也可以使用规则语言来简化你的程序而缩短你的开发时间。

正则表达式101
很多正则表达式的语法看起来很相似，这是因为你以前你没有研究过它们。通配符是RE的一个结构类型，即重复操作。让我们先看一看ERE标准的最通用的基本语法类型。为了能够提供具有特定用途的范例，我将使用几个不同的程序。

第二部分：
----------------------
字符匹配

正则表达式的关键之处在于确定你要搜索匹配的东西，如果没有这一概念，Res将毫无用处。

每一个表达式都包含需要查找的指令，如表A所示。

Table A: Character-matching regular expressions
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
.
Match any one character
grep .ord sample.txt
Will match “ford”, “lord”, “2ord”, etc. in the file sample.txt.
-----------------
[ ]
Match any one character listed between the brackets
grep [cng]ord sample.txt
Will match only “cord”, “nord”, and “gord”
---------------------
[^ ]
Match any one character not listed between the brackets

grep [^cn]ord sample.txt
Will match “lord”, “2ord”, etc. but not “cord” or “nord”

grep [a-zA-Z]ord sample.txt
Will match “aord”, “bord”, “Aord”, “Bord”, etc.

grep [^0-9]ord sample.txt
Will match “Aord”, “aord”, etc. but not “2ord”, etc.

重复操作符
重复操作符，或数量词，都描述了查找一个特定字符的次数。它们常被用于字符匹配语法以查找多行的字符，可参见表B。

Table B: Regular expression repetition operators
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
?
Match any character one time, if it exists
egrep “?erd” sample.txt
Will match “berd”, “herd”, etc. and “erd”
------------------
*
Match declared element multiple times, if it exists
egrep “n.*rd” sample.txt
Will match “nerd”, “nrd”, “neard”, etc.
-------------------
+
Match declared element one or more times
egrep “[n]+erd” sample.txt
Will match “nerd”, “nnerd”, etc., but not “erd”
--------------------
{n}
Match declared element exactly n times
egrep “[a-z]{2}erd” sample.txt
Will match “cherd”, “blerd”, etc. but not “nerd”, “erd”, “buzzerd”, etc.
------------------------
{n,}
Match declared element at least n times
egrep “.{2,}erd” sample.txt
Will match “cherd” and “buzzerd”, but not “nerd”
------------------------
{n,N}
Match declared element at least n times, but not more than N times
egrep “n[e]{1,2}rd” sample.txt
Will match “nerd” and “neerd”

第三部分：
----------------
锚
锚是指它所要匹配的格式，如图C所示。使用它能方便你查找通用字符的合并。例如，我用vi行编辑器命令:s来代表substitute，这一命令的基本语法是：

s/pattern_to_match/pattern_to_substitute/

Table C: Regular expression anchors
-------------
操作
解释
例子
结果
---------------
^
Match at the beginning of a line
s/^/blah /
Inserts “blah “ at the beginning of the line
---------------
$
Match at the end of a line
s/$/ blah/
Inserts “ blah” at the end of the line
---------------
\<
Match at the beginning of a word
s/\Inserts “blah” at the beginning of the word

egrep “\Matches “blahfield”, etc.
------------------
\>
Match at the end of a word
s/\>/blah/
Inserts “blah” at the end of the word

egrep “\>blah” sample.txt
Matches “soupblah”, etc.
---------------
\b
Match at the beginning or end of a word
egrep “\bblah” sample.txt
Matches “blahcake” and “countblah”
-----------------
\B
Match in the middle of a word
egrep “\Bblah” sample.txt
Matches “sublahper”, etc.

间隔

Res中的另一可便之处是间隔(或插入)符号。实际上，这一符号相当于一个OR语句并代表|符号。下面的语句返回文件sample.txt中的“nerd” 和 “merd”的句柄：

egrep “(n|m)erd” sample.txt

间隔功能非常强大，特别是当你寻找文件不同拼写的时候，但你可以在下面的例子得到相同的结果：

egrep “[nm]erd” sample.txt

当你使用间隔功能与Res的高级特性连接在一起时，它的真正用处更能体现出来。

第四部分：
----------------
一些保留字符
Res的最后一个最重要特性是保留字符(也称特定字符)。例如，如果你想要查找“ne*rd”和“ni*rd”的字符，格式匹配语句“n[ei]*rd”与“neeeeerd” 和 “nieieierd”相符合，但并不是你要查找的字符。因为‘*’(星号)是个保留字符，你必须用一个反斜线符号来替代它，即：“n[ei]\*rd”。其它的保留字符包括：

^ (carat)
. (period)
[ (left bracket}
$ (dollar sign)
( (left parenthesis)
) (right parenthesis)
| (pipe)
* (asterisk)
+ (plus symbol)
? (question mark)
{ (left curly bracket, or left brace)
\ backslash
一旦你把以上这些字符包括在你的字符搜索中，毫无疑问Res变得非常的难读。比如说以下的PHP中的eregi搜索引擎代码就很难读了。

eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$",$sendto)

你可以看到，程序的意图很难把握。但如果你抛开保留字符，你常常会错误地理解代码的意思。

总结
在本文中，我们揭开了正则表达式的神秘面纱，并列出了ERE标准的通用语法。如果你想阅览Open Group组织的规则的完整描述，你可以参见：Regular Expressions，欢迎你在其中的讨论区发表你的问题或观点。

另外一篇文章
----------------------------------------
正则表达式和Java编程语言
-----------------------------------------
类和方法

下面的类根据正则表达式指定的模式，与字符序列进行匹配。

Pattern类

Pattern类的实例表示以字符串形式指定的正则表达式，其语法类似于Perl所用的语法。

用字符串形式指定的正则表达式，必须先编译成Pattern类的实例。生成的模式用于创建Matcher对象，它根据正则表达式与任意字符序列进行匹配。多个匹配器可以共享一个模式，因为它是非专属的。

用compile方法把给定的正则表达式编译成模式，然后用 matcher方法创建一个匹配器，这个匹配器将根据此模式对给定输入进行匹配。pattern 方法可返回编译这个模式所用的正则表达式。

split方法是一种方便的方法，它在与此模式匹配的位置将给定输入序列切分开。下面的例子演示了：

/*
* 用split对以逗号和/或空格分隔的输入字符串进行切分。
*/
import java.util.regex.*;

public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[,\\s]+");
// Split input with the pattern
String[] result =
p.split("one,two, three four , five");
for (int i=0; iSystem.out.println(result);
5 }) ]' q  {7 |5 f}
+ ~. M: [9 N& h8 S}; T( m' U& @( ?

* @# v! c8 A$ i# f5 kMatcher类 " n* B  `6 i8 E
# m' R; t* X* F- U; [8 U$ l
Matcher类的实例用于根据给定的字符串序列模式，对字符序列进行匹配。使用CharSequence接口把输入提供给匹配器，以便支持来自多种多样输入源的字符的匹配。2 t: ^# a+ q& s, a* {3 T, E: O( f7 O
! K7 M7 w# u# s8 L# s
通过调用某个模式的matcher方法，从这个模式生成匹配器。匹配器创建之后，就可以用它来执行三类不同的匹配操作：6 A6 X+ i! A: F$ m+ p

5 M9 _3 _& C, }4 `4 }2 S# _matches方法试图根据此模式，对整个输入序列进行匹配。
lookingAt方法试图根据此模式，从开始处对输入序列进行匹配。
9 ~9 ?; F. i- _: xfind方法将扫描输入序列，寻找下一个与模式匹配的地方。 2 A& }' D9 W$ Z
) j4 D5 \) g; x# L: d. m$ N4 Y
这些方法都会返回一个表示成功或失败的布尔值。如果匹配成功，通过查询匹配器的状态，可以获得更多的信息
6 k) Z( j2 R( q5 r2 h8 Y  e8 f7 v  t& i$ U% ]
这个类还定义了用新字符串替换匹配序列的方法，这些字符串的内容如果需要的话，可以从匹配结果推算得出。
. _* i1 B' o. E/ T5 B
. x2 z3 Q$ \) s7 H% I- JappendReplacement方法先添加字符串中从当前位置到下一个匹配位置之间的所有字符，然后添加替换值。appendTail添加的是字符串中从最后一次匹配的位置之后开始，直到结尾的部分。1 w7 l* v6 }. v7 |. p& W
! \" m! J  ~2 @' E- m8 L
例如，在字符串blahcatblahcatblah中，第一个 appendReplacement添加blahdog。第二个 appendReplacement添加blahdog，然后 appendTail添加blah，就生成了： blahdogblahdogblah。请参见示例简单的单词替换。
  k3 j% K, s3 @7 e% \& e( y4 A; h2 }3 g; H. Y
CharSequence接口4 @- p( p/ {3 U2 v

- {) ?0 f/ x( h' Z4 Q9 T7 HCharSequence接口为许多不同类型的字符序列提供了统一的只读访问。你提供要从不同来源搜索的数据。用String, StringBuffer 和CharBuffer实现CharSequence,，这样就可以很容易地从它们那里获得要搜索的数据。如果这些可用数据源没一个合适的，你可以通过实现CharSequence接口，编写你自己的输入源。
1 ]- g( |6 E2 f& D' J2 k* g2 {  x) l4 q2 f2 n
Regex情景范例3 C- |' o3 a! e7 v9 l+ r+ ]

8 {0 M) I& z3 F% K0 D以下代码范例演示了java.util.regex软件包在各种常见情形下的用法：8 @8 e: {5 |) d

0 d: J0 N' ]0 t* y( W% |, {简单的单词替换8 h! Q# S! ^$ [5 O1 R. ]3 [2 s
5 _& |; S. D  A1 O' i
/*
. G- ?% H: H2 u% V$ Z, J& i& Z* This code writes "One dog, two dogs in the yard."# F; n5 G1 G! E! N$ I! S/ ^$ S* Q
* to the standard-output stream:
- j+ [3 O. v* w*/* V) E4 Q. e* \' E+ \. l7 d5 R0 ]) _
import java.util.regex.*;$ C3 r' m" Y" Y* L% {- I

9 Y; Q, s8 ~1 f6 j5 R, ^) Bpublic class Replacement {
* `5 f7 e" `3 }% k, C1 v$ z7 Upublic static void main(String[] args)
4 {" x5 E( K1 N) {$ [    throws Exception {! a2 Z. F3 e+ h/ a9 w' B
// Create a pattern to match cat* R* U) D$ S, P, C9 f+ Y
Pattern p = Pattern.compile("cat");
' k$ ^/ H# _; K" o% ~0 D3 G% v- \// Create a matcher with an input string
8 m) S- v/ U9 T3 J( N' xMatcher m = p.matcher("one cat," +5 y# \! ^4 t1 _% ~' G
   " two cats in the yard");
; o3 U2 f2 O8 W! G' ?7 `: cStringBuffer sb = new StringBuffer();
1 ]/ [6 d0 L. Q; \0 w; M) fboolean result = m.find();
5 t6 q* r. R7 B* O3 X8 M// Loop through and create a new String / h4 E' a0 ~* L# a8 _
// with the replacements$ y5 {' v5 [% N* a& s- w
while(result) {5 K4 B2 s+ U; q% {
m.appendReplacement(sb, "dog");
# v4 m2 Y0 N7 v0 x/ yresult = m.find();
$ d' I0 [: I: d7 Q4 I}
1 O% h  J$ [. e: s/ N// Add the last segment of input to & l0 D& v" P; e/ w9 s7 p: {
// the new String
) S3 C* u. B  U3 B$ B& Cm.appendTail(sb);" R1 A: z# H! M7 X. x- X' P2 W& z
System.out.println(sb.toString());3 `- Q% n% b7 A" ]8 W; X4 N
}
% Q# K# l7 |+ m; Z* N" p$ I8 A}
9 w( S2 |' U' W: {* Q6 W" Y5 U# ]1 ?) Z1 v' V
电子邮件确认  r- d/ }* Y/ N% c* X& e, I, _" \% \6 a
# i7 ~8 S: M4 V+ J. p, f# x
以下代码是这样一个例子：你可以检查一些字符是不是一个电子邮件地址。它并不是一个完整的、适用于所有可能情形的电子邮件确认程序，但是可以在需要时加上它。) @+ }" U7 O  e+ K9 b: G
0 M* C2 E; q1 V2 J5 j6 c7 V
/*! [3 @- ]  M/ F7 |0 `; s* M: [: C& P
* Checks for invalid characters5 M( _' n! o9 B: O' d
* in email addresses; X, s: C* Y/ j  D. {5 D9 S
*/
  }4 D+ ]1 r$ n  N+ Apublic class EmailValidation {
6 Z( \6 [6 T3 \! wpublic static void main(String[] args) # ?# ~' i3 |; w9 v- f
         throws Exception {1 u+ u' n( J' g0 L, _- r/ e

6 M; q9 p8 O* A' C8 ~String input = "@sun.com";
' s% {" Z# a- U. X9 s//Checks for email addresses starting with6 M5 @& z" M- C1 K! O
//inappropriate symbols like dots or @ signs.
! i  p6 o& t2 E0 nPattern p = Pattern.compile("^\\.|^\\@");2 [6 [1 u+ B! T% p
Matcher m = p.matcher(input);8 c8 P% n5 _5 Y& I! W# ]
if (m.find())3 m$ u" U% [- E5 F
System.err.println("Email addresses don't start" +
3 L! Z2 n2 V! Z  z+ U- j/ `# o       " with dots or @ signs.");; Y2 d7 {' V8 B# y- v
//Checks for email addresses that start with
& L1 P$ _  s! i6 N- q7 {//www. and prints a message if it does.
1 j3 L0 F- p  l5 h* t/ wp = Pattern.compile("^www\\.");8 K3 J, e0 W( m7 D
m = p.matcher(input);
% n1 o! g4 e" ?2 nif (m.find()) {
$ k# m5 y4 k% ^4 kSystem.out.println("Email addresses don't start" +* W* @, @. v( v7 D2 w# f/ n
" with \"www.\", only web pages do.");* o& r$ E, d/ I4 i$ R
}
5 X4 H) B( o. y5 K* o8 |1 ?p = Pattern.compile("[^A-Za-z0-9\\.\\@_\\-~#]+");
/ J7 j, C3 ?' K4 O7 xm = p.matcher(input);4 m: o3 ~2 d7 G
StringBuffer sb = new StringBuffer();7 @$ @" [, [- ^. ~
boolean result = m.find();# U$ N7 j5 B3 s1 L
boolean deletedIllegalChars = false;5 r# i$ c7 ~9 n4 W5 |* T# \
; h: f9 H3 W4 G; c+ v, G3 O1 R
while(result) {
, M6 q, y5 u% I# U# Y9 ]deletedIllegalChars = true;5 n' D5 ^- h' F5 `
m.appendReplacement(sb, "");
: G# ?7 S2 A5 {. i3 u+ Lresult = m.find();
9 G+ [. N, z/ O% o) ~$ j! D. n}) I  {  N% a* \9 v. P

: o1 x* c( u2 _$ M4 W  F. g8 X// Add the last segment of input to the new String
2 l! q7 N2 S3 u' s, {m.appendTail(sb);2 d& F! T- a, b- R2 H

- K5 C8 q2 \& o5 z1 N% c* }input = sb.toString();
0 ~: Y0 p9 {% \" O& d+ v  a2 i2 f+ K$ @6 `# @+ T" c- b
if (deletedIllegalChars) {- E* l0 U4 X- w" }5 N: r* |1 r
System.out.println("It contained incorrect characters" +9 g" g3 [$ j; F% W8 T
   " , such as spaces or commas.");4 z. {4 Q$ m  g
}
+ a. `2 S8 t# `. |+ w* e7 I}
9 Z1 d- D" n! j: B}$ w& B& X+ x( ?- b' W
% p. ]9 S( x4 i" s+ I# Y
从文件中删除控制字符
+ F  b# o  O5 C5 u% l9 {/ C0 T+ y( J" w* r9 T
/* This class removes control characters from a named
& y! i  {( ~% k* file.8 J1 F6 w* g8 e4 d
*/5 l9 M" S0 l& b- ?. |/ a
import java.util.regex.*;
6 ]! H  ^6 r+ aimport java.io.*;* d% D3 x0 H; q( K% s
9 e* J4 X! g4 T
public class Control {
8 ^# B# |0 Q# J$ q" k; N% dpublic static void main(String[] args)
. m; J9 [3 l2 ^0 e  J% v! V, q          throws Exception {9 R/ a7 S2 Z5 f/ N+ G* K

; p' D( }: |7 S" K# A+ i//Create a file object with the file name
, q( s* Z) H# R9 a, f6 Z, A6 [//in the argument:
8 u4 z2 _) o# x% i/ J3 RFile fin = new File("fileName1");
4 C. r4 X9 U* gFile fout = new File("fileName2");' g* t6 i. w  c5 O! _5 N
//Open and input and output stream
# ^6 c) k, r& W1 M+ p8 s8 hFileInputStream fis =
/ n: L4 w4 t+ J. n) j    new FileInputStream(fin);4 r4 g, J& q! I. r
FileOutputStream fos = & b2 I6 ]4 |2 i, x7 G. O& @
   new FileOutputStream(fout);$ G! _* d: ^. E9 k& E; V" @+ o

! P) K6 q5 w9 i  A; a/ CBufferedReader in = new BufferedReader(
7 k+ i9 \$ w  d    new InputStreamReader(fis));3 |: ]. Y% T7 }  X7 [
BufferedWriter out = new BufferedWriter(
; t; _# N0 |* C1 o- z. }: S, F    new OutputStreamWriter(fos));! R5 A" Z; r1 B2 E2 l

$ P+ \+ l: R) h& n" G1 N/ j// The pattern matches control characters* [4 @; b9 G2 k" o4 R& H5 k
Pattern p = Pattern.compile("{cntrl}");
& \" k: C/ l3 T! j9 HMatcher m = p.matcher("");4 J  x% ?. u. p* S& y: O8 d
String aLine = null;
8 W' `: w4 A/ fwhile((aLine = in.readLine()) != null) {. N% L% p, E# B4 c5 f- @# N- Z
m.reset(aLine);7 S3 i1 h3 Q& F# c- c
//Replaces control characters with an empty& e0 c# Z2 t2 D, _, Y1 @" j9 R
//string.
9 A! a' A" N* K  O; ~# HString result = m.replaceAll("");
$ [5 K# z& `3 i0 o; c5 b. Kout.write(result);* `# C- P* R# W& W6 Y* J
out.newLine();; X' I8 h+ P% {, W
}1 o6 d; {! D2 ~( H9 C$ b
in.close();4 D/ {' `% L% e3 i5 |' L2 ?
out.close();
4 s, H8 H$ }1 h% z! n# n* `}
  y: [  F; n  I' h; N, s4 Z}- e" X+ B9 m! M( o! u( k! A5 Z5 [
5 T& A4 v4 P' R3 b( r7 R) O
文件查找 $ w; D; }& E' c4 R4 A" v

! L( b' ?, ^# ^% w' A6 j) z5 ^/*/ ?1 K) c+ V2 Y
* Prints out the comments found in a .java file.
3 x, f- U/ y9 [7 v% h! Z1 Z6 R*/
8 |& F- X- k" C7 vimport java.util.regex.*;+ o5 W3 u8 i/ ]* p: a: H6 O
import java.io.*;
! S. o  ]( ?1 Gimport java.nio.*;
- D/ l& w+ Z) r) aimport java.nio.charset.*;- r- T/ ^7 E( X* B$ ~% ?  w
import java.nio.channels.*;0 J9 X: ^3 [/ S: Z
. j0 x% h0 x7 y1 T- i6 \
public class CharBufferExample {
2 \" G7 |( j& Q* T$ g+ Fpublic static void main(String[] args) throws Exception {/ n. n9 p: }8 _3 }: _6 X
// Create a pattern to match comments9 ^- M& i: T- b3 h' J' K6 Y! f3 b
Pattern p =
7 Y/ P! {! s. y% a3 KPattern.compile("//.*$", Pattern.MULTILINE);! z5 v$ I1 Z3 ?- D. |

6 w+ M8 `* N, Q// Get a Channel for the source file
$ E( Z8 w- I9 V7 B9 Y- ?File f = new File("Replacement.java");
, O# C3 n% D  q0 D4 HFileInputStream fis = new FileInputStream(f);7 W3 y* U3 i& B
FileChannel fc = fis.getChannel();$ y% R/ R$ w- n# r" s* Q6 p

4 b% |. o9 i5 v- D- a* @1 \5 T// Get a CharBuffer from the source file+ R9 l9 k  E2 |2 ?$ ~: X
ByteBuffer bb =
  Q& |/ z+ [4 U1 o) j8 B0 Dfc.map(FileChannel.MAP_RO, 0, (int)fc.size());2 {/ `( ?+ D5 \/ N0 L, B
Charset cs = Charset.forName("8859_1");
) M1 C; w5 i$ S8 RCharsetDecoder cd = cs.newDecoder();
! Q% n$ e5 k" M8 u9 ICharBuffer cb = cd.decode(bb);# b" p7 C5 s* f- ?
# b# a1 U$ y* V" q
// Run some matches
) G: R8 y4 c0 v+ MMatcher m = p.matcher(cb);8 X; N# H# v+ `+ i* \
while (m.find())
0 W& g  ^# ~. FSystem.out.println("Found comment: "+m.group());9 H; J0 ]: t. C  [$ R
}4 n# m9 I" A7 J+ Q
}4 S* z9 b6 d# [! L1 T( b

: Y; K7 s5 H& A* q结论
8 q! n3 k2 F3 B现在Java编程语言中的模式匹配和许多其他编程语言一样灵活了。可以在应用程序中使用正则表达式，确保数据在输入数据库或发送给应用程序其他部分之前，格式是正确的，正则表达式还可以用于各种各样的管理性工作。简而言之，在Java编程中，可以在任何需要模式匹配的地方使用正则表达式。
. _; l" w9 c4 B/ M) p
/ A" W1 H7 ~3 V9 Z9 JJDK1.4之正規表示式
- U6 W( F2 a( S3 q: \( a, B% I7 awritten by william chen(06/19/2002)
+ C5 j% \# p) Q. ]; M
  U, a) J" }. k0 S# ?--------------------------------------------------------------------------------% C( ?' J! ?7 T' p* s) U; I
( [0 `2 m  d. ]5 ^. H3 z% b) L3 g
什麼是正規表示式呢(Reqular Expressions)
6 ^& A( E, K2 C% i% o3 v0 b6 {/ Q6 |
就是針對檔案、字串，透過一種很特別的表示式來作search與replace2 F3 o$ B8 a8 Y7 L

5 }/ N- n' k4 v: {& H- }/ Q因為在unix上有很多系統設定都是存放在文字檔中，因此網管或程式設計常常需要作搜尋與取代! L. m8 c$ l% [9 W

1 b2 r8 A9 Z5 V4 O0 P  T! j9 j所以發展出一種特殊的命令叫做正規表示式  ?/ a2 c! _* P  u2 b% C: G

+ h% \9 t$ n4 I/ i$ L$ U" Y5 U我們可以很簡單的用 "s/  E, n" b1 X! ~
因此jdk1.4提供了一組正規表示式的package供大家使用" H2 M7 S. V- n" K$ q$ t& W. C+ E
2 B; C$ D) \9 L  G
若是jdk1.4以下的可以到http://jakarta.apache.org/oro取得相關功能的package
0 s" [5 u. |' E& u% Y) S3 ~6 y
' Z3 Z6 U5 l- y2 v& h剛剛列出的一串符號" s/
" n/ @# Z- P% m9 i$ W. |適用於j2sdk1.4的正規語法7 c7 r: C7 v3 A1 \
# p. m) z) n4 Q" N5 T* B  A
"." 代表任何字元
" v8 `- R. R! }, V: K' ^! [& t6 a( N/ ]  c- v: B7 y
正規式原字串符合之字串 3 r& F$ t) i; J1 W# O
. ab a / j4 E- H+ K  K( O
.. abc ab 3 G1 E8 q& l2 N% O1 r
8 i2 J3 K# s8 l+ t: z/ u0 R
"+" 代表一個或以個以上的字元
5 x0 s. B& Q2 y# h( P"*" 代表零個或是零個以上的字元
2 C( G2 p- X) J$ t( [5 [' K+ v+ x( K: w- Q# a% W! h. J
正規式原字串符合之字串 . w. r: W+ E+ L; t* }
+ ab ab
0 k/ P5 e  F4 e% _( s' ?* abc abc
4 i* N, Q* K# Y3 x
9 b1 K: i4 g! m) T6 b1 G5 q"( )"群組
+ u( C: {! D1 R& Z
6 g7 o9 r/ x! Z( U正規式原字串符合之字串 2 ]& \9 q. U! k% |& ?6 G* }
(ab)* aabab abab
/ `& n0 u7 \$ H6 q" D( c- O+ ^7 V
字元類) X  ?- Z& L% y9 y' E& s+ M4 }

2 `$ {7 w9 \/ j+ S1 }* x8 K! V正規式原字串符合之字串 ) ^! _5 l# k! c  E
[a-dA-D0-9]* abczA0 abcA0
/ U, b2 i/ H4 w# p; U6 E1 L[^a-d]* abe0 e0
. _7 }3 z3 E" ^  F4 J[a-d]* abcdefgh abab
5 e; w' K1 [/ J* ~' b$ R& S# m" E/ Z$ {$ z; }% W0 |* P) A0 B9 c( C
# P' }$ c6 s8 R" _' `5 P; ~) t
簡式
0 L* I& N7 n/ I  }' c/ e) v( d4 H6 S8 ?1 \' W) N
\d 等於 [0-9] 數字   z/ K. j. m2 l* @. e$ R( _
\D 等於 [^0-9] 非數字
+ }0 Z+ A: q5 F\s 等於 [ \t\n\x0B\f\r] 空白字元
3 d, u1 D& g' |% h; d\S 等於 [^ \t\n\x0B\f\r] 非空白字元
- }- r. m7 `4 i0 \\w 等於 [a-zA-Z_0-9] 數字或是英文字 8 r7 d1 Q4 p, J
\W 等於 [^a-zA-Z_0-9] 非數字與英文字 ; O4 z) p3 `3 {( U0 L
2 F  p# ^: k0 |# [. x3 M- s* N% Y
每一行的開頭或結尾) Y; X' k; ~8 V1 `4 \4 p' y

1 k3 c1 N+ h/ N6 s4 y5 I8 d! O^ 表示每行的開頭
4 a7 n; r7 m6 ^( n" g) h$ 表示每行的結尾
, ~) @2 ~6 E% _
6 }6 g& f% U: |0 g! y--------------------------------------------------------------------------------
# P3 A9 |; L% S* e9 A" m! L  u% U+ V, i! }8 b) ^- R1 l1 o/ m
正規表示式 java.util.regex 相關的類別
& k( T' f( T% t, G9 ], e( t% j/ H) m# Z
Pattern—正規表示式的類別
! ^2 C' S  s3 I6 V9 RMatcher—經過正規化的結果- y* ~& ]& W0 _2 O4 e  |
PatternSyntaxExpression—Exception thrown while attempting to compile a regular expression7 s- C% f5 X6 Z& T) f( o

* g  \* `7 J0 z4 F6 }! W; I範例1: 將字串中所有符合"<"的字元取代成"lt;"! r* T0 Q' ]5 w8 D1 R; j. F  [
2 U9 H' }5 x9 S( K+ {- _1 w$ F
import java.io.*;8 i' }3 P4 d2 o+ r
import java.util.regex.*;# ?  O& e6 m: W% G+ z- B
/**, f; c9 g1 A  E* I; a! J
* 將字串中所有符合"<"的字元取代成"lt;"
( i) @: }4 B) ^$ n6 ^* N1 y*/# P' d- H1 p' k. j
public static void replace01(){9 C9 N2 o, J  i. c# C* r# C6 h: a
// BufferedReader lets us read line-by-line9 C4 \" J! S2 k
Reader r = new InputStreamReader( System.in );
% U5 {; C- @5 uBufferedReader br = new BufferedReader( r );
- V; ]9 s, ^5 d5 DPattern pattern = Pattern.compile( "<" ); // 搜尋某字串所有符合'<'的字元
7 u( m% i8 d+ i  D9 p. Y" Atry{
2 G# o9 o  I8 D. g8 Q" ?while (true) {' `1 C  C' P2 v# ?. ]
String line = br.readLine();
5 u- \  l0 g! u+ G, p4 L// Null line means input is exhausted) N- K7 b1 m2 k0 g# @; \! f
if (line==null)7 M' A# S0 I4 @, q6 _8 W+ z5 o
break;! G8 b# i8 |; V% G$ u5 t1 k' {* D; F
Matcher a = pattern.matcher(line);
, {- p; ?+ p" b2 ], pwhile(a.find()){
' E6 |% v- h! _4 I% h6 N* ASystem.out.println("搜尋到的字元是" + a.group());
5 V1 Y9 `" S3 V' c& Z}% Z7 ?6 p- I( s" w7 \3 f" ?
System.out.println(a.replaceAll("lt;"));// 將所有符合字元取代成lt;$ ]9 j# g3 P( K& p# Z% {4 f4 _
}. Y$ U4 W: [! C9 n( f; s9 `
}catch(Exception ex){ex.printStackTrace();};8 |: K: n! S/ ^) I
}% Y8 r1 v* n- o6 f
1 Y7 g+ A, R, p, `; m
範例2:
* g8 l; n, ?$ D7 V! F
" q" P" A, C/ O+ aimport java.io.*;
7 o, a+ R2 |  v; V& dimport java.util.regex.*;# \: l# u; _) }( @3 d8 W" _
/**. ], T- X/ w9 {+ M3 X8 m  Y# S2 {) j
* 類似StringTokenizer的功能
/ J; ~: H) q( J( y7 e* 將字串以","分隔然後比對哪個token最長
( _/ A6 m, U3 p" ^*/
7 z. ?1 h; M. {% F4 ?9 {public static void search01(){3 S6 J' e/ q: P! e
// BufferedReader lets us read line-by-line6 R1 B- [& a4 v- o0 ^9 g
Reader r = new InputStreamReader( System.in );
6 g& E4 ~3 L. d8 _/ W! h: EBufferedReader br = new BufferedReader( r );
) g( F0 a; s, p7 f, S2 H& f' ZPattern pattern = Pattern.compile( ",\\s*" );// 搜尋某字串所有","的字元7 B5 U6 ]& _. L  i) A" y; j
try{; j: u  D7 K+ Y6 S; p) H# U
while (true) {
$ G3 T. b! X5 ]/ Y! qString line = br.readLine();+ O: V# X+ {2 J% k$ ?, n' g# e/ L
String words[] = pattern.split(line);
$ t  h+ i% J! j// Null line means input is exhausted
8 b& x1 j7 f; f' s# c' cif (line==null)( |# N) `0 N- e5 P
break;
, Z; r9 ~, B. Z/ n7 u& o1 E// -1 means we haven't found a word yet' a1 o- r# \  K: q, b# _# A
int longest=-1;+ _5 E6 O: w; U( A8 t1 w
int longestLength=0;' L3 n9 P4 q5 H1 b# ~0 T6 q* ~! J
for (int i=0; iSystem.out.println("分段:" + words );
4 x7 F8 ^% ]  L8 |( w) x8 K  Qif (words.length() > longestLength) {/ g) J: e+ b$ L: Y
longest = i;, X2 A6 L( R3 N0 Q, Z( ?
longestLength = words.length();  t* W( I  m+ K5 J" g
}* g/ V7 h% i5 r2 |0 x0 `4 B
}
  J+ F$ u$ q. Z2 k4 ?System.out.println( "長度最長為:" + words[longest] );
! j8 V, G- ?9 ?2 I( e}
, k' ]0 }% {2 t( h# {+ A& C}catch(Exception ex){ex.printStackTrace();};
5 J4 i9 ~& u9 J4 |% a' M; E}
, Q" j$ O; g1 B5 z3 v. L( }
. p( O7 t+ A3 a: M8 r--------------------------------------------------------------------------------
5 I# J$ r7 f+ l' k5 u3 ?
  ?5 Z: `! r/ G/ w2 b. l# D3 X1 _其他的正規語法4 Q- u. ]: ~1 G- w0 J& ?( E9 R
0 V" Q; z( T; Q
/^\s* # 忽略每行開始的空白字元  @: v1 }$ ]/ T, D+ c4 @- ~/ v
(M(s|r|rs)\.) # 符合 Ms., Mrs., and Mr. (titles)