查看: 491|回复: 1

关于正则表达式---ZT [复制链接]

Longe

管理员

论坛测试[砖]家

Rank: 12

威望: 9084
在线时间: 1242 小时
金币: 6988
贡献: 300
存款: 1660001
最后登录: 2026-5-10
注册时间: 2006-5-10
帖子: 1841
精华: 6
积分: 15416
阅读权限: 200
UID: 10

电梯直达

1楼

发表于 2009-11-9 13:04:38 |只看该作者 |倒序浏览

第一部分：
-----------------
正则表达式(REs)通常被错误地认为是只有少数人理解的一种神秘语言。在表面上它们确实看起来杂乱无章，如果你不知道它的语法，那么它的代码在你眼里只是一堆文字垃圾而已。实际上，正则表达式是非常简单并且可以被理解。读完这篇文章后，你将会通晓正则表达式的通用语法。

支持多种平台

正则表达式最早是由数学家Stephen Kleene于1956年提出，他是在对自然语言的递增研究成果的基础上提出来的。具有完整语法的正则表达式使用在字符的格式匹配方面上，后来被应用到熔融信息技术领域。自从那时起，正则表达式经过几个时期的发展，现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。

正则表达式并非一门专用语言，但它可用于在一个文件或字符里查找和替代文本的一种标准。它具有两种标准：基本的正则表达式(BRE)，扩展的正则表达式(ERE)。ERE包括BRE功能和另外其它的概念。

许多程序中都使用了正则表达式，包括xsh,egrep,sed,vi以及在UNIX平台下的程序。它们可以被很多语言采纳，如HTML 和XML，这些采纳通常只是整个标准的一个子集。

比你想象的还要普通
随着正则表达式移植到交叉平台的程序语言的发展，这的功能也日益完整，使用也逐渐广泛。网络上的搜索引擎使用它，e-mail程序也使用它，即使你不是一个UNIX程序员，你也可以使用规则语言来简化你的程序而缩短你的开发时间。

正则表达式101
很多正则表达式的语法看起来很相似，这是因为你以前你没有研究过它们。通配符是RE的一个结构类型，即重复操作。让我们先看一看ERE标准的最通用的基本语法类型。为了能够提供具有特定用途的范例，我将使用几个不同的程序。

第二部分：
----------------------
字符匹配

正则表达式的关键之处在于确定你要搜索匹配的东西，如果没有这一概念，Res将毫无用处。

每一个表达式都包含需要查找的指令，如表A所示。

Table A: Character-matching regular expressions
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
.
Match any one character
grep .ord sample.txt
Will match “ford”, “lord”, “2ord”, etc. in the file sample.txt.
-----------------
[ ]
Match any one character listed between the brackets
grep [cng]ord sample.txt
Will match only “cord”, “nord”, and “gord”
---------------------
[^ ]
Match any one character not listed between the brackets

grep [^cn]ord sample.txt
Will match “lord”, “2ord”, etc. but not “cord” or “nord”

grep [a-zA-Z]ord sample.txt
Will match “aord”, “bord”, “Aord”, “Bord”, etc.

grep [^0-9]ord sample.txt
Will match “Aord”, “aord”, etc. but not “2ord”, etc.

重复操作符
重复操作符，或数量词，都描述了查找一个特定字符的次数。它们常被用于字符匹配语法以查找多行的字符，可参见表B。

Table B: Regular expression repetition operators
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
?
Match any character one time, if it exists
egrep “?erd” sample.txt
Will match “berd”, “herd”, etc. and “erd”
------------------
*
Match declared element multiple times, if it exists
egrep “n.*rd” sample.txt
Will match “nerd”, “nrd”, “neard”, etc.
-------------------
+
Match declared element one or more times
egrep “[n]+erd” sample.txt
Will match “nerd”, “nnerd”, etc., but not “erd”
--------------------
{n}
Match declared element exactly n times
egrep “[a-z]{2}erd” sample.txt
Will match “cherd”, “blerd”, etc. but not “nerd”, “erd”, “buzzerd”, etc.
------------------------
{n,}
Match declared element at least n times
egrep “.{2,}erd” sample.txt
Will match “cherd” and “buzzerd”, but not “nerd”
------------------------
{n,N}
Match declared element at least n times, but not more than N times
egrep “n[e]{1,2}rd” sample.txt
Will match “nerd” and “neerd”

第三部分：
----------------
锚
锚是指它所要匹配的格式，如图C所示。使用它能方便你查找通用字符的合并。例如，我用vi行编辑器命令:s来代表substitute，这一命令的基本语法是：

s/pattern_to_match/pattern_to_substitute/

Table C: Regular expression anchors
-------------
操作
解释
例子
结果
---------------
^
Match at the beginning of a line
s/^/blah /
Inserts “blah “ at the beginning of the line
---------------
$
Match at the end of a line
s/$/ blah/
Inserts “ blah” at the end of the line
---------------
\<
Match at the beginning of a word
s/\Inserts “blah” at the beginning of the word

egrep “\Matches “blahfield”, etc.
------------------
\>
Match at the end of a word
s/\>/blah/
Inserts “blah” at the end of the word

egrep “\>blah” sample.txt
Matches “soupblah”, etc.
---------------
\b
Match at the beginning or end of a word
egrep “\bblah” sample.txt
Matches “blahcake” and “countblah”
-----------------
\B
Match in the middle of a word
egrep “\Bblah” sample.txt
Matches “sublahper”, etc.

间隔

Res中的另一可便之处是间隔(或插入)符号。实际上，这一符号相当于一个OR语句并代表|符号。下面的语句返回文件sample.txt中的“nerd” 和 “merd”的句柄：

egrep “(n|m)erd” sample.txt

间隔功能非常强大，特别是当你寻找文件不同拼写的时候，但你可以在下面的例子得到相同的结果：

egrep “[nm]erd” sample.txt

当你使用间隔功能与Res的高级特性连接在一起时，它的真正用处更能体现出来。

第四部分：
----------------
一些保留字符
Res的最后一个最重要特性是保留字符(也称特定字符)。例如，如果你想要查找“ne*rd”和“ni*rd”的字符，格式匹配语句“n[ei]*rd”与“neeeeerd” 和 “nieieierd”相符合，但并不是你要查找的字符。因为‘*’(星号)是个保留字符，你必须用一个反斜线符号来替代它，即：“n[ei]\*rd”。其它的保留字符包括：

^ (carat)
. (period)
[ (left bracket}
$ (dollar sign)
( (left parenthesis)
) (right parenthesis)
| (pipe)
* (asterisk)
+ (plus symbol)
? (question mark)
{ (left curly bracket, or left brace)
\ backslash
一旦你把以上这些字符包括在你的字符搜索中，毫无疑问Res变得非常的难读。比如说以下的PHP中的eregi搜索引擎代码就很难读了。

eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$",$sendto)

你可以看到，程序的意图很难把握。但如果你抛开保留字符，你常常会错误地理解代码的意思。

总结
在本文中，我们揭开了正则表达式的神秘面纱，并列出了ERE标准的通用语法。如果你想阅览Open Group组织的规则的完整描述，你可以参见：Regular Expressions，欢迎你在其中的讨论区发表你的问题或观点。

另外一篇文章
----------------------------------------
正则表达式和Java编程语言
-----------------------------------------
类和方法

下面的类根据正则表达式指定的模式，与字符序列进行匹配。

Pattern类

Pattern类的实例表示以字符串形式指定的正则表达式，其语法类似于Perl所用的语法。

用字符串形式指定的正则表达式，必须先编译成Pattern类的实例。生成的模式用于创建Matcher对象，它根据正则表达式与任意字符序列进行匹配。多个匹配器可以共享一个模式，因为它是非专属的。

用compile方法把给定的正则表达式编译成模式，然后用 matcher方法创建一个匹配器，这个匹配器将根据此模式对给定输入进行匹配。pattern 方法可返回编译这个模式所用的正则表达式。

split方法是一种方便的方法，它在与此模式匹配的位置将给定输入序列切分开。下面的例子演示了：

/*
* 用split对以逗号和/或空格分隔的输入字符串进行切分。
*/
import java.util.regex.*;

public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[,\\s]+");
// Split input with the pattern
String[] result =
p.split("one,two, three four , five");
for (int i=0; iSystem.out.println(result);
" J" ^$ G4 H- z0 O  B: B}0 n$ W/ O8 w, S' a
}
6 d: H5 J0 c+ K, t5 [* S$ `# a8 D
) V- g+ N% [1 @/ J+ S% ?9 {Matcher类 & H: E/ f' k/ |7 r" b

( X; M' ]# h$ O" oMatcher类的实例用于根据给定的字符串序列模式，对字符序列进行匹配。使用CharSequence接口把输入提供给匹配器，以便支持来自多种多样输入源的字符的匹配。
/ M+ _7 ^8 {  I) ]6 R' @8 x3 ?# ~6 |4 o: N% V
通过调用某个模式的matcher方法，从这个模式生成匹配器。匹配器创建之后，就可以用它来执行三类不同的匹配操作：
, v; A1 }9 h/ p8 ]* ?) U9 v+ H! H7 r
matches方法试图根据此模式，对整个输入序列进行匹配。 / l& F, u% `9 p9 l9 T
lookingAt方法试图根据此模式，从开始处对输入序列进行匹配。 4 m) [: i4 R: i! F) ?
find方法将扫描输入序列，寻找下一个与模式匹配的地方。
3 S- `; n* Q, |7 K+ C
这些方法都会返回一个表示成功或失败的布尔值。如果匹配成功，通过查询匹配器的状态，可以获得更多的信息; A/ {3 t# B- t$ \4 q7 a7 U
. N8 c; [2 j, l& n( A7 X
这个类还定义了用新字符串替换匹配序列的方法，这些字符串的内容如果需要的话，可以从匹配结果推算得出。; L7 o7 R; ]3 X$ Z

: o$ h+ |/ N" g! Q- d6 [* Y* SappendReplacement方法先添加字符串中从当前位置到下一个匹配位置之间的所有字符，然后添加替换值。appendTail添加的是字符串中从最后一次匹配的位置之后开始，直到结尾的部分。) A/ K, M) |" y( }

2 D8 {0 A0 ]1 V4 B/ F/ I$ c7 ~例如，在字符串blahcatblahcatblah中，第一个 appendReplacement添加blahdog。第二个 appendReplacement添加blahdog，然后 appendTail添加blah，就生成了： blahdogblahdogblah。请参见示例简单的单词替换。
* \4 z; r" E+ J5 y) i' [4 i! p4 k/ f
CharSequence接口
+ k3 S( T$ k* b# x$ Y2 R( R. K2 N3 X6 }9 D( u* ?
CharSequence接口为许多不同类型的字符序列提供了统一的只读访问。你提供要从不同来源搜索的数据。用String, StringBuffer 和CharBuffer实现CharSequence,，这样就可以很容易地从它们那里获得要搜索的数据。如果这些可用数据源没一个合适的，你可以通过实现CharSequence接口，编写你自己的输入源。5 K" n" ?+ J5 P0 l& ^% f- }

& U" D: _1 L2 XRegex情景范例
7 x& O& `/ V& p8 e; j# Z- a2 Z2 m  y% s
以下代码范例演示了java.util.regex软件包在各种常见情形下的用法：& ~% X& X0 f; s5 f/ ]
! y* k# y6 m/ v$ ?, F. a7 I
简单的单词替换5 |5 @& B% H) y9 N7 K" r( \& v1 n
# ^: e. C5 U4 `
/*
2 p5 ]) A" p* M8 h; a+ `* This code writes "One dog, two dogs in the yard."
7 i+ x0 ^# g8 ^8 v* I* Q5 r* to the standard-output stream:
' z5 N8 }, G/ Q  P4 r0 S0 y, ?* q*/4 ?$ u* _& h* N' g1 l  B( o- o
import java.util.regex.*;5 J* N, j" @" _2 v3 z$ s
6 h. y: x5 B7 s! k
public class Replacement {
0 D7 y4 `6 n; n) W, opublic static void main(String[] args) 6 d% `) y5 L) x8 e9 A7 G
   throws Exception {/ i/ ~  }! a( X+ P: }
// Create a pattern to match cat
9 k( `) X" k) H2 R" [+ ^Pattern p = Pattern.compile("cat");
2 d3 c: J1 Y( \6 ~+ A// Create a matcher with an input string
/ O! A. _8 f$ N$ [  hMatcher m = p.matcher("one cat," +( Y( [8 [$ v0 P# G+ y$ x" c  ^
   " two cats in the yard");# Z  h) i3 L$ M1 n4 q
StringBuffer sb = new StringBuffer();7 @3 Z0 c. p  @8 R- `% W. D
boolean result = m.find();3 t9 m+ Z5 o0 ^9 C
// Loop through and create a new String 7 s  B" U; P7 Z, d6 r: u
// with the replacements+ U  h9 ~; x/ f
while(result) {
, J/ S4 ^, K: G6 }* |m.appendReplacement(sb, "dog");$ ]4 b! a5 r8 t& L2 L( e& ~
result = m.find();# e6 ^0 k0 l6 [* |1 `( y) V/ u
}7 }+ t7 G, q2 V9 S
// Add the last segment of input to
6 {" t7 {6 W& o// the new String$ O8 {1 V% K! q" s7 Z1 \& U
m.appendTail(sb);* O% U0 I3 Y$ t3 W# x( c
System.out.println(sb.toString());
" [7 A6 ^# o! Y  Q}8 G! F# Q- g* t2 I+ z
}
: k2 U8 T1 R2 H( {
$ s4 W3 k$ b) B- X0 K电子邮件确认
$ t) i0 O* t0 E( @0 {9 u) e5 g% C. _2 n4 B
以下代码是这样一个例子：你可以检查一些字符是不是一个电子邮件地址。它并不是一个完整的、适用于所有可能情形的电子邮件确认程序，但是可以在需要时加上它。
$ r9 h* M1 a3 a; e! b
# t9 ]1 m; Y, f2 A. Q/*3 i, I9 w! ]8 M0 E/ e, @' M
* Checks for invalid characters5 u& u, L6 I) l& V7 L1 A
* in email addresses
& M. \1 R+ R* v# N/ }' L*/% u) W; ^9 c- l, |; I6 L
public class EmailValidation {3 k  O9 K9 d3 l7 A/ q7 Z
public static void main(String[] args)
) n/ O/ ~4 _/ G, A2 z1 I          throws Exception {
3 w, E* y* Y" P3 ~/ A  n. _            f6 ~& Q5 x1 Q& o" A: ~& {
String input = "@sun.com";
5 x" F& h! c, f6 A8 M9 u; U//Checks for email addresses starting with6 U1 m6 u; \; W; M  p& P1 {' y, g
//inappropriate symbols like dots or @ signs.0 k3 c( F  u  \% N2 m$ p3 C
Pattern p = Pattern.compile("^\\.|^\\@");% d, l- ^' L+ w0 n' c2 j
Matcher m = p.matcher(input);* @1 {4 ?7 j5 H% s' p' |
if (m.find())# d" e( z) e$ o, A; `
System.err.println("Email addresses don't start" +1 P2 ~& M5 e5 h3 E2 H8 H9 Q! {
      " with dots or @ signs.");
& `$ x. K& U- X( ?7 u0 g* _% m" {//Checks for email addresses that start with/ f$ E3 I  ^; }+ ~) y
//www. and prints a message if it does.
) p2 b: o# u! s* r7 }  r) \  ip = Pattern.compile("^www\\.");. v( N) o" e& D" K2 v% ]# V
m = p.matcher(input);
, R" {3 H, i5 ]0 Jif (m.find()) {) Z8 J# V* O* P1 V# J' a
System.out.println("Email addresses don't start" +
# J7 {; R( V8 o3 b' g% H " with \"www.\", only web pages do.");
! g- l: R1 G* |/ g9 z0 Y3 H; d}
  O- T# G4 y6 C1 i0 b8 op = Pattern.compile("[^A-Za-z0-9\\.\\@_\\-~#]+");
: Z: ?& l$ k/ A! r: qm = p.matcher(input);7 s0 a9 g5 V. ?6 \0 r# u. N
StringBuffer sb = new StringBuffer();
) C. f! T) C% F: r/ ]0 i$ U0 Y5 Wboolean result = m.find();( x& H' a- U6 q/ c& v; w% u
boolean deletedIllegalChars = false;; n! e: A2 \! q5 }

! e2 |8 `  u+ `: }! K0 `while(result) {
2 I+ N7 m" Q6 G( F2 {7 odeletedIllegalChars = true;! Z" X6 {2 ^9 D6 {: y% Q6 F
m.appendReplacement(sb, "");
% C6 h9 H' b$ E' G/ o, Kresult = m.find();
- h, l/ D' x4 y: Y. _}; D5 ]& l0 Q) `/ l, j/ m0 I. `8 a
: w1 w4 K/ U6 u0 B  F6 X% p% F& Z. R
// Add the last segment of input to the new String
: ^. K8 e+ e2 B: Q6 Mm.appendTail(sb);
( n0 u$ e/ u7 \
, g" }$ M% E9 A( l' Qinput = sb.toString();
; E3 {1 @# |" z4 e4 E$ r5 F
6 t* B9 h7 v, N& h1 Zif (deletedIllegalChars) {
/ ]$ T, b+ @, [5 L' ISystem.out.println("It contained incorrect characters" +
* ~% ]# I( g1 h- |9 A    " , such as spaces or commas.");2 N2 z* G+ E7 S6 s, t6 I+ r# e
}, i0 ^. g$ ?- e# F* ?
}
9 U0 t5 X: {5 c" r) Q}
* o: H- |# G, }4 V
: p7 N& Z6 Q: E+ p* {0 e$ F从文件中删除控制字符
) o6 w& n+ l1 L# X; ?& b3 O3 w* P. i/ X/ j0 s  Q5 V/ _
/* This class removes control characters from a named
( R, S3 @  ], D" ]; u* file.! g- b1 r' l8 q2 X! C% k6 [
*/
8 V9 T1 c% v" v  P7 M# R$ f) Cimport java.util.regex.*;. ~( B6 X* d/ N$ K3 l
import java.io.*;
4 T7 W3 H/ }' _" g0 h/ ?
4 q9 r6 V  L! t! X0 @- C1 Ypublic class Control {5 w7 @$ q) j8 m
public static void main(String[] args)
3 h6 A7 i8 j- q' g1 w# n9 G          throws Exception {. G: I- {- f6 p9 N7 z7 W
         ' P+ c+ ~) f9 p% D: u
//Create a file object with the file name
3 x5 Q) o& e6 Y//in the argument:. F4 `: P0 F' t. g+ N$ X" B
File fin = new File("fileName1");$ K- Q) ]. w. N# J( s
File fout = new File("fileName2");
3 @4 h# e' L: H& `' }. n//Open and input and output stream! S" N( R( Y- {5 n: ]' k8 v8 u
FileInputStream fis =
: l9 c+ i0 I9 _. X" x) G    new FileInputStream(fin);/ E- O0 {8 o% w. P
FileOutputStream fos =
8 u3 @% x" L+ |: _0 E    new FileOutputStream(fout);& N! {- `0 ~0 X$ X4 Q  K

+ a( p0 D2 s) u' L) }" V6 ?BufferedReader in = new BufferedReader(
. y7 X0 c/ ?( G% f3 @1 c- _    new InputStreamReader(fis));
4 p9 [; A. g1 lBufferedWriter out = new BufferedWriter(
: f/ M7 p4 G- I2 p6 x) B$ W$ r    new OutputStreamWriter(fos));. H' i- t8 d# |. w3 P

% Z6 e; r6 Y  t// The pattern matches control characters
* ~/ K4 S# ~7 p7 {; g7 gPattern p = Pattern.compile("{cntrl}");
; b9 _0 ^1 Y0 m/ wMatcher m = p.matcher("");# @7 m( H+ p  N
String aLine = null;2 d& t0 x2 d; n7 H" V
while((aLine = in.readLine()) != null) {) U( D; y" \& G9 a" }
m.reset(aLine);
7 b3 U* M, D" q) {9 Z; P% _$ w4 R//Replaces control characters with an empty
: t  c' _, P7 {& F4 Y( z$ s//string.
; A7 G; ^) [7 o" ?6 n" a6 rString result = m.replaceAll("");  R5 k& L4 a, W; m/ J/ N4 a
out.write(result);
0 X/ S5 x: M8 `% R  H3 k3 cout.newLine();
6 \# w: u) K7 g8 N8 I1 Y+ _8 }}' H4 c# t. |/ W# l. S7 Q
in.close();
) h$ H& b0 T8 Y( V& F* g- bout.close();
% z" f& Z; V' K}+ ?' X2 h. t6 a0 W3 Y3 Z# X
}1 ?  h7 H8 R2 P/ P( c$ w' R3 o

& C$ q; D! l; B6 F, r/ o" I文件查找
5 H% g" Y4 \0 f' T& q0 W
& U7 V5 C; m* F0 b/ W/*5 r9 r5 H* n/ N6 W$ ]
* Prints out the comments found in a .java file.4 l1 B+ U) o: b6 w$ }
*/- G5 ]' Q" l! Q9 ~0 e: L3 U; h
import java.util.regex.*;: Q! u( C. N, p" K
import java.io.*;0 O' W& P* q  H& ]; C/ B
import java.nio.*;
; y* U6 P' p! v4 U4 d# jimport java.nio.charset.*;8 r* [6 S9 p; i5 Q* u3 F% x( A
import java.nio.channels.*;2 L' b) Y. h9 I5 S$ \, ?. R

3 X$ W4 t# l. l" x) b! xpublic class CharBufferExample {( J/ }2 a) a7 M. C9 s5 S
public static void main(String[] args) throws Exception {( w" I" [8 @  b% ~7 ]9 n
// Create a pattern to match comments
& K* d* W: a% D9 M% |' lPattern p = # ?8 Y7 i- \. u! ?; g( Q
Pattern.compile("//.*$", Pattern.MULTILINE);
" X0 ^5 E0 U/ `! H6 V6 X5 }  U
2 n7 G# V& l) ~# W// Get a Channel for the source file
8 S. o. j4 |6 ^5 ~File f = new File("Replacement.java");; J7 n+ F. |) ~. b! n& E0 I. W
FileInputStream fis = new FileInputStream(f);
9 V; n0 W& ^+ a( Z" v. lFileChannel fc = fis.getChannel();
$ c3 {( B0 M/ R3 X% z$ W( r! Z0 @6 E  l
// Get a CharBuffer from the source file
: b* v+ `1 S" @. LByteBuffer bb =
( G/ }+ r5 Z) Xfc.map(FileChannel.MAP_RO, 0, (int)fc.size());
: j3 a6 _' S4 VCharset cs = Charset.forName("8859_1");
8 z3 [* O1 B' pCharsetDecoder cd = cs.newDecoder();
" \7 h$ k$ [7 m# F; Q; |  dCharBuffer cb = cd.decode(bb);
7 l# ]; l9 }; A8 M2 t( r6 G- f2 c3 Q- a- V( J
// Run some matches) O& V1 Q. D2 Z$ O; x: I1 j
Matcher m = p.matcher(cb);
5 \. Q& O! J6 g" x' X0 |$ y6 bwhile (m.find())
' B$ {; b" U# P) C" t" H  QSystem.out.println("Found comment: "+m.group());
  ~; s/ d6 w2 [' v}
0 S- H6 g; |' u) w1 ^0 I. Z% z! x+ e}5 W) B  A# q3 l9 f5 d

/ D9 a& }8 J0 E结论" j$ H/ d% F% u7 m! a1 c( O
现在Java编程语言中的模式匹配和许多其他编程语言一样灵活了。可以在应用程序中使用正则表达式，确保数据在输入数据库或发送给应用程序其他部分之前，格式是正确的，正则表达式还可以用于各种各样的管理性工作。简而言之，在Java编程中，可以在任何需要模式匹配的地方使用正则表达式。
* ^* v5 }$ T% v2 ^4 p1 t0 ?7 y' e0 P. N
JDK1.4之正規表示式
* X# F3 N  @8 N3 ]& ?9 Vwritten by william chen(06/19/2002)
% T! s% z. I3 \0 l7 H6 @2 z0 o, U
# a3 k6 A- a3 E1 v! ^: U) I--------------------------------------------------------------------------------
4 d* B0 W3 c. Q8 @# a( X$ u0 G$ z9 q9 i, X; q$ J5 `8 ]
什麼是正規表示式呢(Reqular Expressions)  Y8 R- c8 e  ?( w# B
. U. M" K$ |0 q4 Q+ }4 \
就是針對檔案、字串，透過一種很特別的表示式來作search與replace
6 @5 C: t. U6 Q2 Z% @
' U' s" u  ]( G  y- T$ b9 l1 @因為在unix上有很多系統設定都是存放在文字檔中，因此網管或程式設計常常需要作搜尋與取代* [, C( m+ }. @0 v' e
, E# {4 ?+ r# D& p6 u$ }% @
所以發展出一種特殊的命令叫做正規表示式) P& ~- O% ]1 c" U, Y

3 u9 ?6 D( {/ i- h3 x/ I我們可以很簡單的用 "s/
& D7 B! P( {4 X# U因此jdk1.4提供了一組正規表示式的package供大家使用1 \' _, f/ v4 c; k

' I/ q# V% P' ^8 z# X若是jdk1.4以下的可以到http://jakarta.apache.org/oro取得相關功能的package
5 y2 ^$ \0 w8 V6 A8 w' a6 d+ Y+ E& H& e3 s2 h
剛剛列出的一串符號" s/
( d: ^9 M! u% x7 O. n# v: h0 A適用於j2sdk1.4的正規語法
! x$ J* Z, h8 j) [7 [5 Q
! P1 N6 Y0 V" P"." 代表任何字元
" P( n$ W0 v3 G5 f- L5 Z3 V
5 l% d4 w7 M* {) O  e) m正規式原字串符合之字串
  Z0 ^3 Y2 t$ \' {. ab a # M% Y. E; E! K, x+ y% M2 V! z
.. abc ab
3 s( Q* i2 h& S( c# G* e/ ~1 R
* I" \6 Z9 z6 R3 C) t"+" 代表一個或以個以上的字元
) G# y7 b6 @1 F* y6 |( }5 f: Q"*" 代表零個或是零個以上的字元( ?1 _% |2 q! w7 X6 G& m; B* `
- J% ~) f/ X: }4 m: G! Z
正規式原字串符合之字串
# r6 o7 c2 b/ U& n0 W! \2 W  f+ ab ab
" K% ?6 v2 o1 Q& r% |* abc abc % M! J" r0 _- @6 A) q! ^* K
9 U: E' z& C. s6 g2 f
"( )"群組7 r* b- A5 Q, K( D+ Q) c

+ Z. u1 K$ U  r; ]- E/ q) i正規式原字串符合之字串
6 y" m% C% p' E3 v' v% R& v(ab)* aabab abab ( @) y$ e$ x3 p8 Z: d
9 k& _' ^7 d. p
字元類
! F  y# m; A+ K" ~& p
: U0 u$ h' S; ]正規式原字串符合之字串
8 l2 W3 T: \4 B' ?$ _# t[a-dA-D0-9]* abczA0 abcA0 " y8 l4 r) ~5 |7 t8 G
[^a-d]* abe0 e0
" `- ?& C9 Z' Z0 H6 t* x  Z2 w[a-d]* abcdefgh abab
0 Z, k% r% A8 N; k4 T  g9 \: D
8 e/ h; Z! a5 ]; x
( u+ ?0 z! K2 @1 t" |, e# _4 O簡式: y" H) d1 x! \

7 N2 `9 k8 a" A1 p\d 等於 [0-9] 數字
; v' w' T# q) u  P# E\D 等於 [^0-9] 非數字
. Y! N% e8 R- Z\s 等於 [ \t\n\x0B\f\r] 空白字元
7 g* Q. U( |* Z9 @% ]0 N, m\S 等於 [^ \t\n\x0B\f\r] 非空白字元 ' D$ c# e5 Z( u  D- o/ F
\w 等於 [a-zA-Z_0-9] 數字或是英文字
5 r9 g4 i7 z7 H/ S\W 等於 [^a-zA-Z_0-9] 非數字與英文字
/ ~! }% G1 _; e$ V  ]- @/ x
6 R5 Y- B) @3 Q6 W5 Q1 N每一行的開頭或結尾# f: |( A" Q, o* @. ?3 {7 G, Y

; g- ?" C/ B8 n^ 表示每行的開頭/ _9 u/ s! U& L) G0 ^$ F. J! M9 k
$ 表示每行的結尾
! u  k" @$ M8 E, Y& X, _8 K3 @
  W8 V! W! j4 u8 y" l--------------------------------------------------------------------------------+ }2 V5 }" J  b: r

* ~- j3 f! C' Y. M正規表示式 java.util.regex 相關的類別
7 [# |: X: s# o% L" t# w$ q& g# C6 w
Pattern—正規表示式的類別
% ?  U8 ?( U  U( {9 I2 N# d( iMatcher—經過正規化的結果
: E- Y0 f( c% T/ @PatternSyntaxExpression—Exception thrown while attempting to compile a regular expression! S" ]) Y6 N) J, v: J6 R( c
6 {: a) \; Q8 I4 d
範例1: 將字串中所有符合"<"的字元取代成"lt;"
% n  _2 ]7 e9 f, F! j; u7 ]9 L) b
/ a( X6 {$ a0 ~" Eimport java.io.*;/ X: F) E5 g: R& F& k; l6 h% a
import java.util.regex.*;! U% Q0 X* ]' W8 `+ M
/**
& v' @( \6 G* ^' S6 ^: ]0 @) l* b/ [4 _* 將字串中所有符合"<"的字元取代成"lt;"3 v  j- `$ \4 ~4 T5 _. {
*/' f, v0 E; m5 r4 G
public static void replace01(){7 f. a) K$ Q- K
// BufferedReader lets us read line-by-line
/ d4 X' _# W  qReader r = new InputStreamReader( System.in );  T6 k; d& q* ~/ U3 u% N
BufferedReader br = new BufferedReader( r );
  f: o  K5 j; A; nPattern pattern = Pattern.compile( "<" ); // 搜尋某字串所有符合'<'的字元6 ^$ [% c. u' M
try{4 i" f, S5 S- V
while (true) {! U# h1 B4 E# M9 d* L4 M/ }% Y9 j$ H
String line = br.readLine();! r( L2 U6 x& {4 d4 C9 m
// Null line means input is exhausted
  u6 u$ G+ I& ]1 `. gif (line==null)
( ~0 o% m* n2 T4 \! g9 jbreak;# d9 N3 q: e9 f1 f& _, c" r: [
Matcher a = pattern.matcher(line);" G% c/ ^' [4 }
while(a.find()){* i4 v# m% O4 o5 O# T0 v) y3 C
System.out.println("搜尋到的字元是" + a.group());( W" D0 U7 i5 c7 Q  o
}8 `# n3 ]: \7 v  J# T0 _, j8 E* e9 S
System.out.println(a.replaceAll("lt;"));// 將所有符合字元取代成lt;9 i/ a7 P9 W6 c0 {9 }
}5 M& {" X8 [2 l' e' F+ P
}catch(Exception ex){ex.printStackTrace();};
3 v. t( {! `; h/ w; j" q, S}
- m, z) R, R5 Y8 t" H2 _$ Q, x& H7 [  R' ~6 P
範例2: 6 z( j" G0 ^: k3 f, U  N

. p& w- U$ A) j4 r, cimport java.io.*;+ f9 N: Y1 b: ?7 H
import java.util.regex.*;
, X0 ]- b6 S$ Z6 i' }. t: R/**; n' a0 A" ]+ o$ _% X1 `3 W3 i; r, r% n
* 類似StringTokenizer的功能: _: b$ g; |7 c: W
* 將字串以","分隔然後比對哪個token最長
* g1 f! _. v7 \/ m4 @' ^' p*/: T# \; M3 |$ b! }5 Y
public static void search01(){
. t) ^. P# P9 f6 q9 C5 `. A8 z" B$ B// BufferedReader lets us read line-by-line
9 H7 r/ S. F2 D8 ^) I  {2 B( zReader r = new InputStreamReader( System.in );
" ~2 }1 |6 |/ E: H5 OBufferedReader br = new BufferedReader( r );
8 h: G- h' i, Y' RPattern pattern = Pattern.compile( ",\\s*" );// 搜尋某字串所有","的字元8 H* j2 h' i0 q& R7 h# ]. t" r$ L
try{
$ u# V' }0 j# h8 p6 Q% Ywhile (true) {
3 w* x# k8 ^% h9 eString line = br.readLine();* U* L: ^% ^9 J+ c$ y9 G
String words[] = pattern.split(line);; a- b& v6 O* h& }  Z* K. ~: x3 E
// Null line means input is exhausted. B& Q# |; x+ @# W
if (line==null)! ]5 X& ~" Z- W9 ?2 L
break;5 s2 h+ ]: u  v
// -1 means we haven't found a word yet
0 {# s/ @% U! \! t& ]  Bint longest=-1;9 l3 M7 ]9 h4 h
int longestLength=0;6 n' T( E+ j/ Y: b# Y4 P6 A
for (int i=0; iSystem.out.println("分段:" + words );
/ ^1 F5 p; ]2 |3 lif (words.length() > longestLength) {+ c% `7 U7 o7 A% S
longest = i;' O! A' x: ^4 F+ R" }3 M7 l7 u
longestLength = words.length();
. B3 t4 \5 r- S- d}0 I/ p; Y1 ~7 y7 @( x) J
}% Q: T( a- n2 \4 a; X2 c2 g3 ]
System.out.println( "長度最長為:" + words[longest] );
; X* P1 d( D8 d}- \" }0 x8 U3 w7 u3 C' V7 h
}catch(Exception ex){ex.printStackTrace();};( W* D+ X% f, `) M
}" R! T6 h% Y5 ]0 n; \  V
# S8 W$ U7 q6 d
--------------------------------------------------------------------------------
3 B: t) O  z) ?. V& O- y9 P/ Y7 r6 m; @9 R: |$ b3 \
其他的正規語法9 J1 R* K( Q4 r$ C0 ~' _: b

  ~( b; R3 T7 G4 {3 K) F* n/^\s* # 忽略每行開始的空白字元
, m6 J- b2 b$ H' w  ~(M(s|r|rs)\.) # 符合 Ms., Mrs., and Mr. (titles)