查看: 484|回复: 1

关于正则表达式---ZT [复制链接]

Longe

管理员

论坛测试[砖]家

Rank: 12

威望: 9084
在线时间: 1242 小时
金币: 6988
贡献: 300
存款: 1660001
最后登录: 2026-5-10
注册时间: 2006-5-10
帖子: 1841
精华: 6
积分: 15416
阅读权限: 200
UID: 10

电梯直达

1楼

发表于 2009-11-9 13:04:38 |只看该作者 |倒序浏览

第一部分：
-----------------
正则表达式(REs)通常被错误地认为是只有少数人理解的一种神秘语言。在表面上它们确实看起来杂乱无章，如果你不知道它的语法，那么它的代码在你眼里只是一堆文字垃圾而已。实际上，正则表达式是非常简单并且可以被理解。读完这篇文章后，你将会通晓正则表达式的通用语法。

支持多种平台

正则表达式最早是由数学家Stephen Kleene于1956年提出，他是在对自然语言的递增研究成果的基础上提出来的。具有完整语法的正则表达式使用在字符的格式匹配方面上，后来被应用到熔融信息技术领域。自从那时起，正则表达式经过几个时期的发展，现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。

正则表达式并非一门专用语言，但它可用于在一个文件或字符里查找和替代文本的一种标准。它具有两种标准：基本的正则表达式(BRE)，扩展的正则表达式(ERE)。ERE包括BRE功能和另外其它的概念。

许多程序中都使用了正则表达式，包括xsh,egrep,sed,vi以及在UNIX平台下的程序。它们可以被很多语言采纳，如HTML 和XML，这些采纳通常只是整个标准的一个子集。

比你想象的还要普通
随着正则表达式移植到交叉平台的程序语言的发展，这的功能也日益完整，使用也逐渐广泛。网络上的搜索引擎使用它，e-mail程序也使用它，即使你不是一个UNIX程序员，你也可以使用规则语言来简化你的程序而缩短你的开发时间。

正则表达式101
很多正则表达式的语法看起来很相似，这是因为你以前你没有研究过它们。通配符是RE的一个结构类型，即重复操作。让我们先看一看ERE标准的最通用的基本语法类型。为了能够提供具有特定用途的范例，我将使用几个不同的程序。

第二部分：
----------------------
字符匹配

正则表达式的关键之处在于确定你要搜索匹配的东西，如果没有这一概念，Res将毫无用处。

每一个表达式都包含需要查找的指令，如表A所示。

Table A: Character-matching regular expressions
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
.
Match any one character
grep .ord sample.txt
Will match “ford”, “lord”, “2ord”, etc. in the file sample.txt.
-----------------
[ ]
Match any one character listed between the brackets
grep [cng]ord sample.txt
Will match only “cord”, “nord”, and “gord”
---------------------
[^ ]
Match any one character not listed between the brackets

grep [^cn]ord sample.txt
Will match “lord”, “2ord”, etc. but not “cord” or “nord”

grep [a-zA-Z]ord sample.txt
Will match “aord”, “bord”, “Aord”, “Bord”, etc.

grep [^0-9]ord sample.txt
Will match “Aord”, “aord”, etc. but not “2ord”, etc.

重复操作符
重复操作符，或数量词，都描述了查找一个特定字符的次数。它们常被用于字符匹配语法以查找多行的字符，可参见表B。

Table B: Regular expression repetition operators
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
?
Match any character one time, if it exists
egrep “?erd” sample.txt
Will match “berd”, “herd”, etc. and “erd”
------------------
*
Match declared element multiple times, if it exists
egrep “n.*rd” sample.txt
Will match “nerd”, “nrd”, “neard”, etc.
-------------------
+
Match declared element one or more times
egrep “[n]+erd” sample.txt
Will match “nerd”, “nnerd”, etc., but not “erd”
--------------------
{n}
Match declared element exactly n times
egrep “[a-z]{2}erd” sample.txt
Will match “cherd”, “blerd”, etc. but not “nerd”, “erd”, “buzzerd”, etc.
------------------------
{n,}
Match declared element at least n times
egrep “.{2,}erd” sample.txt
Will match “cherd” and “buzzerd”, but not “nerd”
------------------------
{n,N}
Match declared element at least n times, but not more than N times
egrep “n[e]{1,2}rd” sample.txt
Will match “nerd” and “neerd”

第三部分：
----------------
锚
锚是指它所要匹配的格式，如图C所示。使用它能方便你查找通用字符的合并。例如，我用vi行编辑器命令:s来代表substitute，这一命令的基本语法是：

s/pattern_to_match/pattern_to_substitute/

Table C: Regular expression anchors
-------------
操作
解释
例子
结果
---------------
^
Match at the beginning of a line
s/^/blah /
Inserts “blah “ at the beginning of the line
---------------
$
Match at the end of a line
s/$/ blah/
Inserts “ blah” at the end of the line
---------------
\<
Match at the beginning of a word
s/\Inserts “blah” at the beginning of the word

egrep “\Matches “blahfield”, etc.
------------------
\>
Match at the end of a word
s/\>/blah/
Inserts “blah” at the end of the word

egrep “\>blah” sample.txt
Matches “soupblah”, etc.
---------------
\b
Match at the beginning or end of a word
egrep “\bblah” sample.txt
Matches “blahcake” and “countblah”
-----------------
\B
Match in the middle of a word
egrep “\Bblah” sample.txt
Matches “sublahper”, etc.

间隔

Res中的另一可便之处是间隔(或插入)符号。实际上，这一符号相当于一个OR语句并代表|符号。下面的语句返回文件sample.txt中的“nerd” 和 “merd”的句柄：

egrep “(n|m)erd” sample.txt

间隔功能非常强大，特别是当你寻找文件不同拼写的时候，但你可以在下面的例子得到相同的结果：

egrep “[nm]erd” sample.txt

当你使用间隔功能与Res的高级特性连接在一起时，它的真正用处更能体现出来。

第四部分：
----------------
一些保留字符
Res的最后一个最重要特性是保留字符(也称特定字符)。例如，如果你想要查找“ne*rd”和“ni*rd”的字符，格式匹配语句“n[ei]*rd”与“neeeeerd” 和 “nieieierd”相符合，但并不是你要查找的字符。因为‘*’(星号)是个保留字符，你必须用一个反斜线符号来替代它，即：“n[ei]\*rd”。其它的保留字符包括：

^ (carat)
. (period)
[ (left bracket}
$ (dollar sign)
( (left parenthesis)
) (right parenthesis)
| (pipe)
* (asterisk)
+ (plus symbol)
? (question mark)
{ (left curly bracket, or left brace)
\ backslash
一旦你把以上这些字符包括在你的字符搜索中，毫无疑问Res变得非常的难读。比如说以下的PHP中的eregi搜索引擎代码就很难读了。

eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$",$sendto)

你可以看到，程序的意图很难把握。但如果你抛开保留字符，你常常会错误地理解代码的意思。

总结
在本文中，我们揭开了正则表达式的神秘面纱，并列出了ERE标准的通用语法。如果你想阅览Open Group组织的规则的完整描述，你可以参见：Regular Expressions，欢迎你在其中的讨论区发表你的问题或观点。

另外一篇文章
----------------------------------------
正则表达式和Java编程语言
-----------------------------------------
类和方法

下面的类根据正则表达式指定的模式，与字符序列进行匹配。

Pattern类

Pattern类的实例表示以字符串形式指定的正则表达式，其语法类似于Perl所用的语法。

用字符串形式指定的正则表达式，必须先编译成Pattern类的实例。生成的模式用于创建Matcher对象，它根据正则表达式与任意字符序列进行匹配。多个匹配器可以共享一个模式，因为它是非专属的。

用compile方法把给定的正则表达式编译成模式，然后用 matcher方法创建一个匹配器，这个匹配器将根据此模式对给定输入进行匹配。pattern 方法可返回编译这个模式所用的正则表达式。

split方法是一种方便的方法，它在与此模式匹配的位置将给定输入序列切分开。下面的例子演示了：

/*
* 用split对以逗号和/或空格分隔的输入字符串进行切分。
*/
import java.util.regex.*;

public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[,\\s]+");
// Split input with the pattern
String[] result =
p.split("one,two, three four , five");
for (int i=0; iSystem.out.println(result);0 Q/ h8 Q2 c2 d- @6 ?1 K! g
}
& A" K& \8 v9 X}% t! N3 G6 d* g: ^$ k4 q& m  N

% ?2 {3 z" z" D) Y* G8 C3 uMatcher类 0 J) M7 c( y+ n5 A+ `: S3 r3 P
' y4 h- F) C* {2 M5 E3 @/ ]) X
Matcher类的实例用于根据给定的字符串序列模式，对字符序列进行匹配。使用CharSequence接口把输入提供给匹配器，以便支持来自多种多样输入源的字符的匹配。
% ~* X8 w7 B( O' i6 z( O" D, s( i1 N4 J* g9 L) A
通过调用某个模式的matcher方法，从这个模式生成匹配器。匹配器创建之后，就可以用它来执行三类不同的匹配操作：
1 W" p, v  ?4 ^+ Z1 |$ J+ m6 Z) Q
matches方法试图根据此模式，对整个输入序列进行匹配。
( L. L  U5 _7 r1 N# t% I/ P, |; blookingAt方法试图根据此模式，从开始处对输入序列进行匹配。 6 o8 J8 a/ U% F; L1 \
find方法将扫描输入序列，寻找下一个与模式匹配的地方。
9 b# T: n- m7 L, o% P# e+ S9 a" w1 u) ]% [# ~4 j( E
这些方法都会返回一个表示成功或失败的布尔值。如果匹配成功，通过查询匹配器的状态，可以获得更多的信息
5 ]! T# [2 L1 j* X% C$ X- q. p
$ v4 u6 }5 k. k这个类还定义了用新字符串替换匹配序列的方法，这些字符串的内容如果需要的话，可以从匹配结果推算得出。
4 H8 p( ?/ q9 L* [
5 B* J/ P2 Y' W% KappendReplacement方法先添加字符串中从当前位置到下一个匹配位置之间的所有字符，然后添加替换值。appendTail添加的是字符串中从最后一次匹配的位置之后开始，直到结尾的部分。
: P% }4 |. i4 l2 M1 i* I( n; W, S8 z4 R* c* d( T. |7 w+ v& r
例如，在字符串blahcatblahcatblah中，第一个 appendReplacement添加blahdog。第二个 appendReplacement添加blahdog，然后 appendTail添加blah，就生成了： blahdogblahdogblah。请参见示例简单的单词替换。
. g) A& {7 I, I" S5 V: Q5 J7 B5 J0 N" u% J. b7 g
CharSequence接口7 E: v: N+ V, O: i8 y" N; P3 p+ Z4 N
& S  i$ B) O2 H" @( P
CharSequence接口为许多不同类型的字符序列提供了统一的只读访问。你提供要从不同来源搜索的数据。用String, StringBuffer 和CharBuffer实现CharSequence,，这样就可以很容易地从它们那里获得要搜索的数据。如果这些可用数据源没一个合适的，你可以通过实现CharSequence接口，编写你自己的输入源。" w3 F& d- T4 d4 V" _2 C
6 V* I) U1 l, @2 Q4 J/ S
Regex情景范例* f* X2 {" \' Q7 V# |, J

8 h$ {+ y2 B2 O& [- T以下代码范例演示了java.util.regex软件包在各种常见情形下的用法：0 N% D  G8 m9 f
) Z: _  @; Y3 a! Y9 U8 j
简单的单词替换
! E9 G1 s. Z, _0 [- ?/ u! p8 V1 Z6 [7 x5 k5 C
/*
) b. {+ K; D& Y9 M( Z* This code writes "One dog, two dogs in the yard."
/ I4 \$ G/ F1 Y! x' P, L: t* to the standard-output stream:
" o0 @+ I: w% l" E8 l*/" L6 F2 q. o4 _4 S" p; A5 d
import java.util.regex.*;: F$ v# w% ~6 E5 s) |( E
! W. ~7 q8 i* o+ Y6 X
public class Replacement {, j6 O2 W- F8 b. j) i
public static void main(String[] args)
* V7 i4 f( u* q4 z    throws Exception {8 O, a/ l' B# _- U
// Create a pattern to match cat2 D& n7 U, B( N! |2 X( z
Pattern p = Pattern.compile("cat");
3 U; o9 g  N; U& M/ R! E// Create a matcher with an input string
) m; q+ z1 s! O0 h. {$ o* h# gMatcher m = p.matcher("one cat," +
4 \  Q7 r9 A2 m4 h/ W    " two cats in the yard");
. |: A: n0 P- t7 A9 XStringBuffer sb = new StringBuffer();/ t, h4 \) G4 Q' `1 V9 C9 {
boolean result = m.find();2 X  z8 m9 v* M. T/ w
// Loop through and create a new String
2 D, K2 q0 A* E9 H$ _/ x* J( E// with the replacements' z' N$ m' x0 ]  C
while(result) {! s  y, G6 j5 s8 i: b
m.appendReplacement(sb, "dog");
7 R$ D6 c4 z0 ]7 T/ R5 ~  sresult = m.find();. g( v5 \5 S# \! M# i' Y1 `% M. Y
}
! Y% ^/ ~( [8 I5 r  a% |* o9 N// Add the last segment of input to
5 t5 ^) q- I3 R; X+ p// the new String
2 Y" h/ h6 _5 h, l, F& ?& sm.appendTail(sb);
4 f! z( K7 A8 ~, `; O/ y  j" p+ YSystem.out.println(sb.toString());! Q3 N+ m4 j9 h' F# a% G) u" s
}! {/ C' q+ v8 Y4 W
}( ?5 _0 ~/ E  Q' i0 F

# l7 `/ Y1 c, E$ q4 g# }/ ?电子邮件确认
5 F; {9 e9 K" P9 g+ a
0 }7 N$ x8 E% e1 q  M6 g. a以下代码是这样一个例子：你可以检查一些字符是不是一个电子邮件地址。它并不是一个完整的、适用于所有可能情形的电子邮件确认程序，但是可以在需要时加上它。
! |( V& f! C' M" ?; g2 n& S+ \
4 N" Q. I8 G: c  r7 W3 W8 ~( X/*# @2 j3 q. g. N* u! }- B7 w
* Checks for invalid characters
0 }' H! Q% R9 q! x* in email addresses
& M' ]5 i# y- n. T: o*/
$ H/ g3 a. e4 c5 U/ V7 wpublic class EmailValidation {, U- g, l) `' M0 K% {; w. u
public static void main(String[] args)
: s6 w. B7 ^( c; p1 K( j( j7 w          throws Exception {. ]$ p1 p9 ~( W0 m! F7 T* C; v

( ^( t7 ^4 M2 L0 P) }/ dString input = "@sun.com";
3 `7 n1 R* y: n( p& ]//Checks for email addresses starting with
1 R8 L, u' b4 _9 \4 F. a' v8 x//inappropriate symbols like dots or @ signs.
9 K4 E- Z9 v. n) S: C! sPattern p = Pattern.compile("^\\.|^\\@");
4 b5 g; f( H- p/ `4 T! N0 I' GMatcher m = p.matcher(input);
8 O' w# q% ?0 T0 z4 `8 f0 V0 P+ ?if (m.find())! P0 H0 U9 ?( ]8 e
System.err.println("Email addresses don't start" +
$ O/ B2 e. z. v( t; e+ b       " with dots or @ signs.");- D3 V/ C4 I; V4 U: [& u
//Checks for email addresses that start with6 N, ^2 ^+ W  ]& @
//www. and prints a message if it does.4 R6 y7 j/ j+ Q" R( X
p = Pattern.compile("^www\\.");
$ W! ^% y" j/ L+ s6 Zm = p.matcher(input);
) I/ R( \3 w% j& l$ W6 m) ?: fif (m.find()) {6 o5 [; g& I" r5 V) }4 q2 _
System.out.println("Email addresses don't start" +
1 c( k# n' X9 j, \' S1 y " with \"www.\", only web pages do.");
7 |" A2 V5 ?- [% N9 C' z}
7 B9 H$ Q0 X, p: e! p/ z1 w8 Op = Pattern.compile("[^A-Za-z0-9\\.\\@_\\-~#]+");# B8 I: P. b3 p1 N) M( |" C
m = p.matcher(input);
/ f' c4 s: g5 ?( VStringBuffer sb = new StringBuffer();6 _4 c5 M3 |$ _5 {
boolean result = m.find();3 _1 w  H0 J) y
boolean deletedIllegalChars = false;- E" s9 K. Y5 ~6 r$ h: y( \$ O

: Q8 ~2 u( g9 z) z+ }! zwhile(result) {
/ v8 b) d0 P) [4 |) V, @5 f7 Q2 ?deletedIllegalChars = true;: k- O8 O& E: t6 t
m.appendReplacement(sb, "");; c% I6 r. ]' l* Y5 B
result = m.find();
. l* P* e, h7 H8 D}  p; [) l% \# z$ f" N+ ?

# i! |) b) n1 o7 p// Add the last segment of input to the new String
5 D. g' f$ }  w1 {m.appendTail(sb);
  t' l+ U' K$ ^
# b% q: b' W- b0 S3 Uinput = sb.toString();+ R  x* \+ w+ k% X8 C- K$ `2 i
: B$ e: [& y0 N0 C
if (deletedIllegalChars) {
% s! s' V' j! w/ ?System.out.println("It contained incorrect characters" ++ I6 z/ ~* L- q2 _1 u. h: l
   " , such as spaces or commas.");
  d4 }- L) c2 x0 M6 X}
2 X$ L# Z% }. T}
6 x3 X/ B. p' @}& o# }  z) s2 N. f( w
$ C! R, k# ]3 g
从文件中删除控制字符
  B# I' E) p9 z" v, {7 S5 F; q  t5 @4 o( g+ V! y
/* This class removes control characters from a named* f5 X: C& z. U3 S; Z
* file.3 d9 s9 J8 x; T  J
*/5 [$ @) I2 t" V8 d% W% G
import java.util.regex.*;2 y2 j/ Z+ `; o! J4 L
import java.io.*;' T2 L/ ?$ Q# E% ]9 q; H! E+ J7 ]

4 i9 Z5 f( H1 |  n( ipublic class Control {
3 a3 K5 P" W# e* k- g+ q' Y$ j, M' E0 Xpublic static void main(String[] args) 1 X; ?. |& }* Z$ L$ ~1 A) b! }
         throws Exception {
+ o6 ^# L. L& ~1 W
+ n5 X5 @  V  @$ [  @; H$ b//Create a file object with the file name: E/ `/ Y* i) M' C$ C
//in the argument:
$ r! K, k- J) N8 G: R% k- r9 H+ N  rFile fin = new File("fileName1");
, W% N+ ^& O5 ?$ q3 J. tFile fout = new File("fileName2");
+ X* K/ O' e' v% i8 v//Open and input and output stream
# ~9 q# T: r  _FileInputStream fis =
# R3 [" q6 b; E* h/ w# I    new FileInputStream(fin);
8 p# N, B4 j7 P/ K+ Z1 GFileOutputStream fos = 8 y. N$ h- K2 g" h, Z6 v0 n& L
   new FileOutputStream(fout);
1 ?' V  N/ a6 }5 s  G( v# y
* |2 ?2 ?7 R/ E# s1 Z- EBufferedReader in = new BufferedReader(
( x" g0 U( `2 H% `5 l* {9 y- |( z    new InputStreamReader(fis));
9 W2 o$ M3 u# O  V0 |$ b& sBufferedWriter out = new BufferedWriter(
2 I) Q. Q' r* X4 s( M# a    new OutputStreamWriter(fos));
# G6 m' K2 w" o) H" o4 L' f
% l" R9 \* d; T) f% T& B  J// The pattern matches control characters
2 D. p  J4 I2 T5 NPattern p = Pattern.compile("{cntrl}");) ?5 {1 _7 L) ~  j3 _' l
Matcher m = p.matcher("");
$ B0 h: v3 Y9 S. L3 UString aLine = null;
0 P: Z1 |6 H+ v6 e9 ~while((aLine = in.readLine()) != null) {
' g5 ^# @& b) r! |. o8 P7 u, f0 `m.reset(aLine);
! K! B; m* M$ l% }: s//Replaces control characters with an empty/ o: t! U; Q- E- w% Z
//string.
5 w! s3 q& v, ?String result = m.replaceAll("");3 k+ \- U  S  P* S
out.write(result);  J4 l. P) f8 A
out.newLine();
5 D' I" ?" A2 F' }1 o}$ a4 i% _' g; n, J
in.close();
1 q' S& V+ A/ y) ^& V2 R) M1 k; a, Sout.close();3 S+ T' b( L6 n. N  E9 Q" Z/ ~
}( x) B. [8 q6 c( P  Y
}9 ^7 t- \2 J$ l

* U; S& H; ~) c文件查找
/ ~. G5 Y' a7 b; A  X" X, |% [; A) y" f- q' [2 {9 k/ o* U
/*
" }1 u/ P. a2 E4 b: G* Prints out the comments found in a .java file.# k6 P, x' o" S
*/
1 w% z6 Q, v# \, D* timport java.util.regex.*;
5 t8 i5 V4 d: g( simport java.io.*;
. j7 w% Y; A2 F& T" H* l6 G( Aimport java.nio.*;
- N" l* a0 k1 F6 G8 ximport java.nio.charset.*;
: Y2 H% x* U0 `6 Y0 J' K. z* ?import java.nio.channels.*;7 P6 B2 i: ^# R. O; x$ S
* Y+ V' p: g( L. D
public class CharBufferExample {7 \7 s* E* o5 H' a/ \0 y
public static void main(String[] args) throws Exception {
, Q% T* C5 i& k0 |' s1 k3 z6 ?// Create a pattern to match comments
, f* `( Q: k& H/ N% U0 [9 E# JPattern p = 8 x" l8 O* e2 N4 Q; Q: D
Pattern.compile("//.*$", Pattern.MULTILINE);
0 t' O" B6 w$ }% t! y) K5 W0 F" @( @* T" f( p8 S5 c. l
// Get a Channel for the source file* b/ l" f7 l+ B: j1 N
File f = new File("Replacement.java");6 }. F' h! n5 x
FileInputStream fis = new FileInputStream(f);
& D/ j( f5 U  s) ~- D; IFileChannel fc = fis.getChannel();* w, q( p- }& N) A; N
2 D* S2 f! \. C3 B  h0 ~
// Get a CharBuffer from the source file2 f6 |- u" V' p0 W* ~) x& T% }" O- A
ByteBuffer bb = ' M; V) Y3 M: X& c- R
fc.map(FileChannel.MAP_RO, 0, (int)fc.size());- V5 }0 b% |' M, F
Charset cs = Charset.forName("8859_1");" |" \" h% l% C  O+ n( _
CharsetDecoder cd = cs.newDecoder();8 o% {% E5 @0 f! B) c2 P% J
CharBuffer cb = cd.decode(bb);( t% o3 N8 \8 v0 G
9 m; J1 C$ @; r' b; [. b6 i% v% }
// Run some matches" ^, y7 m9 E* `* H+ Y7 i; ?. u0 P) Y7 X
Matcher m = p.matcher(cb);
) a& H) O( D7 I& k( {" \while (m.find())9 l/ h# r4 L: V* G* f& Q- {( P
System.out.println("Found comment: "+m.group());  z5 u& |/ r" ~. t: b8 p9 w
}
5 ~2 r& x2 [' u( l1 r. i1 E}2 v5 k! {) }8 D. h. c8 W
$ ~9 M4 p( z4 L: f
结论
! @$ B8 }' t8 A- m! A& S4 R现在Java编程语言中的模式匹配和许多其他编程语言一样灵活了。可以在应用程序中使用正则表达式，确保数据在输入数据库或发送给应用程序其他部分之前，格式是正确的，正则表达式还可以用于各种各样的管理性工作。简而言之，在Java编程中，可以在任何需要模式匹配的地方使用正则表达式。
  K) V+ u! J# E- l  ~
& s3 u( Q& o$ ~2 X4 u. PJDK1.4之正規表示式; M* S6 f$ x  x8 S
written by william chen(06/19/2002), |7 V9 L  {* G- [3 {1 k% r

/ C7 i( F2 E! F+ q--------------------------------------------------------------------------------* ?0 }; n5 c1 u* L5 J
/ r& Y% Z' D, u
什麼是正規表示式呢(Reqular Expressions), ~, f: P$ j* q6 f  Q- B% `

/ E: \( X% ~$ `* W就是針對檔案、字串，透過一種很特別的表示式來作search與replace' v7 d) G+ o  G# J9 q
; s7 Q1 s$ b9 A5 C) _( m' e
因為在unix上有很多系統設定都是存放在文字檔中，因此網管或程式設計常常需要作搜尋與取代
4 c1 f' s  N' o+ f- h  F. q& x7 Y
( Q$ ~& v1 G6 B5 [+ s0 u+ f所以發展出一種特殊的命令叫做正規表示式: S) t8 o% f( d' b  F5 H
+ ^: m! e, t) [0 n
我們可以很簡單的用 "s/
) |0 n; N/ h3 n3 c7 d4 W因此jdk1.4提供了一組正規表示式的package供大家使用$ ~: y3 _) W% W% H
$ a; n$ Y3 j7 z6 N7 `. J
若是jdk1.4以下的可以到http://jakarta.apache.org/oro取得相關功能的package
! S1 T. r" R, X
9 I* D7 \9 J1 d* s+ F9 n剛剛列出的一串符號" s/
4 n% b! g: B. V適用於j2sdk1.4的正規語法
# ]/ P1 o0 A( U) M: X% f0 B1 M5 {
# j3 j4 H" P6 j3 Y"." 代表任何字元9 j; d1 x0 `! U, V& V8 U
& v0 \3 i+ }; p/ D( I: `8 o( Q
正規式原字串符合之字串
# ]2 D; C% Z. G, U. ab a
4 ~% _# X; ?/ S( M; t- d.. abc ab
* I4 S( J7 t. s( w4 D* E9 y
% H5 B/ M' q9 @$ ?"+" 代表一個或以個以上的字元; s' x5 F& {; k6 K! Y' _/ X% h
"*" 代表零個或是零個以上的字元
" p" G% W% A; \$ h1 [- H5 f4 ]4 S8 h" Y4 E; @6 u* B1 y
正規式原字串符合之字串
9 X/ f& z% g# |! `8 m, @7 Y9 l( N+ ab ab
( H/ \0 f9 ]9 T0 f8 ?# O- I- j* abc abc
9 g) L6 K/ z3 a7 ?1 S# t# h
6 W. V) v) b8 I8 z) G7 M$ t& k"( )"群組
9 S% J( l4 A2 ~$ G
$ o* ^( H* ?6 X% q5 l正規式原字串符合之字串
3 h6 N: y  B, N) \3 q8 d0 Z(ab)* aabab abab
1 a) T, Y1 t  j% J5 z! v: D2 D' _2 r* q2 j8 ^' J% L3 V
字元類
7 u9 y4 e+ J- Q' P
+ d" b7 e  j* g  t9 E, A: F正規式原字串符合之字串 - F! k1 E, @$ c- N
[a-dA-D0-9]* abczA0 abcA0 9 d: `) j; o0 _. b2 j% k
[^a-d]* abe0 e0   W6 E: }6 y9 I4 W6 e. @
[a-d]* abcdefgh abab . A1 y8 _, }: u/ o. K! n- F4 g

9 y; [) F3 H2 v# I# q3 A9 w; N+ ]1 Q7 M. l1 Y% f1 Z5 W
簡式) x3 J6 n* P, J+ ~" s
/ r& ~4 D: X# a
\d 等於 [0-9] 數字 0 c* D; n- z4 X4 d+ x; y3 M7 [! K: P
\D 等於 [^0-9] 非數字 ) ^; a* F) p) ?' b
\s 等於 [ \t\n\x0B\f\r] 空白字元
5 c, x1 ]/ }4 b3 N% N0 J\S 等於 [^ \t\n\x0B\f\r] 非空白字元 2 g5 A# U. d& _, [8 ?( `6 D
\w 等於 [a-zA-Z_0-9] 數字或是英文字 & z/ ?1 D* J2 f* ]; s
\W 等於 [^a-zA-Z_0-9] 非數字與英文字 0 |4 ?' Z$ b  O5 x- ~& ^' S

: \- f, q9 V) Z1 [& b2 Q每一行的開頭或結尾5 m+ W! T; o+ j/ s3 ?

: s* R+ N+ A3 Y5 i# |) [9 Q4 [4 Y^ 表示每行的開頭
7 D9 U1 k' x; `5 B  \1 T7 P$ 表示每行的結尾0 B, S/ b# f, i, p5 R+ u& S

# L! @( t  t$ b' Q--------------------------------------------------------------------------------; A+ n& E" u0 U4 s9 x
: J, C" q( _$ d- \9 d9 y; p2 B
正規表示式 java.util.regex 相關的類別
- ^4 }0 N  p' u% Q+ f9 ]7 K/ T( ]8 K9 ?3 c- Y0 N
Pattern—正規表示式的類別
5 O  W- o) _' H( _/ ZMatcher—經過正規化的結果7 `& X6 u! z6 I
PatternSyntaxExpression—Exception thrown while attempting to compile a regular expression. W; k& n3 x9 ~
# G: n) N1 w$ d
範例1: 將字串中所有符合"<"的字元取代成"lt;". `. S7 T  a: r
, _+ B0 U" d6 X' A7 q5 L
import java.io.*;
6 G; b' z8 o# D9 K/ ~- ]1 uimport java.util.regex.*;+ ]/ U' y/ r  O' Y
/**' K  j( A2 O  R% |( s
* 將字串中所有符合"<"的字元取代成"lt;"2 M# R9 i8 `$ i0 m* c$ x3 E) T
*/
2 s; w* s! l' A) O6 t9 N& U* Ppublic static void replace01(){
& j( `: l0 t; I* h* ^, X+ d// BufferedReader lets us read line-by-line
% e: s6 X/ }! D  W/ \- x$ A$ zReader r = new InputStreamReader( System.in );
: x- z* o/ j6 e# w' wBufferedReader br = new BufferedReader( r );' E, ~1 B$ i: U/ B! W
Pattern pattern = Pattern.compile( "<" ); // 搜尋某字串所有符合'<'的字元
% c: x. B0 D- [try{
1 R! y/ {6 c' E/ r% [- F  [6 o2 hwhile (true) {3 P+ W% {7 d* e4 A6 O$ o
String line = br.readLine();3 ^1 _8 E- Q* Y7 {- W6 [
// Null line means input is exhausted! h' ~% U$ a3 ^7 B8 e
if (line==null)
' [/ l4 `9 g- U$ [" [& Y& ebreak;7 L* c* |) b1 h; z/ Q
Matcher a = pattern.matcher(line);2 K' ^8 }' L5 G, R* V
while(a.find()){
& H8 f: P% }! M  h/ z7 ySystem.out.println("搜尋到的字元是" + a.group());
! g/ i8 G* K" X5 l5 W}# w$ `  t9 C; L2 v4 d: V
System.out.println(a.replaceAll("lt;"));// 將所有符合字元取代成lt;
* T! |5 Z3 q) j, N- v, F7 _}6 L; T9 a3 ^4 k! |1 ^0 |4 {
}catch(Exception ex){ex.printStackTrace();};
7 A8 {& l, q: |( z  c}$ X+ x; d( j6 G4 W' c
- K9 v- N; X: a) A
範例2:
& a% ?* s/ H, L0 Q
9 |! A4 W& O8 ^$ Simport java.io.*;- Q: }1 O" G0 Z. ?- C
import java.util.regex.*;; p4 w1 x! Q0 d3 |3 f/ g
/**
) ^8 t3 W7 ?: u# x& ]4 `* 類似StringTokenizer的功能
- ]8 j, i7 z* ?: h* 將字串以","分隔然後比對哪個token最長
" a, n2 x" H! u: m. D! y*// z. i6 I* l; c( S2 B& ]( L
public static void search01(){9 ^; \& L% K1 ?1 e; v- ]$ J8 j
// BufferedReader lets us read line-by-line
/ {# ~- n/ Q/ q: qReader r = new InputStreamReader( System.in );
7 b$ O5 n; k( ^2 P& L$ i. QBufferedReader br = new BufferedReader( r );& O9 M' ~$ x: x+ r9 R0 \" N
Pattern pattern = Pattern.compile( ",\\s*" );// 搜尋某字串所有","的字元* ~6 T' _8 M& U7 [; S
try{+ |( x0 C# K1 e* N" S
while (true) {
0 ^% \/ \% t6 S% e) O$ n) u7 A& W# qString line = br.readLine();. l% D4 e  a' |0 R5 V0 p
String words[] = pattern.split(line);5 u3 C; n5 I& }$ h" d0 t7 z1 l
// Null line means input is exhausted
$ X# n1 o3 P- P, J( ~if (line==null)
  ^4 e- S+ c( w7 c% I/ dbreak;2 B0 J# Q3 [0 r7 a. @" I- K
// -1 means we haven't found a word yet& H$ J7 P' r. z, o8 J$ d
int longest=-1;/ _4 \, P5 M' Y( j9 l8 A
int longestLength=0;( R- O) G  ]9 \# i& w0 G1 A
for (int i=0; iSystem.out.println("分段:" + words );
! }# Z: T7 D! a6 nif (words.length() > longestLength) {
: t" x0 S! z7 y  clongest = i;
1 w, i; i0 E  _0 S5 BlongestLength = words.length();
" ~: n2 h( v# s5 f}
( k, @: b  H, `, m/ G; S# k0 D}0 e1 V9 J- D. H5 G0 }. o
System.out.println( "長度最長為:" + words[longest] );, R/ G$ f0 B, A$ o0 i- O0 ^  J2 a
}
8 Q$ R9 y5 a  d4 N2 \/ C}catch(Exception ex){ex.printStackTrace();};
, N* N" G3 r" r. M$ ]}
+ U' b5 \1 v$ j) E- u6 E$ `4 |7 m- j. M! G& D. A# h
--------------------------------------------------------------------------------8 x1 k- H# M1 s+ m3 E7 D% P

( ^& L1 G. B9 E; w( u- S其他的正規語法6 t1 o: e" m, L! e

2 B4 H) Z; J7 `# H/ n/^\s* # 忽略每行開始的空白字元
  r5 Q# e; m/ n7 d  ^0 H& z(M(s|r|rs)\.) # 符合 Ms., Mrs., and Mr. (titles)