查看: 497|回复: 1

关于正则表达式---ZT [复制链接]

Longe

管理员

论坛测试[砖]家

Rank: 12

威望: 9084
在线时间: 1242 小时
金币: 6990
贡献: 300
存款: 1660001
最后登录: 2026-5-11
注册时间: 2006-5-10
帖子: 1841
精华: 6
积分: 15416
阅读权限: 200
UID: 10

电梯直达

1楼

发表于 2009-11-9 13:04:38 |只看该作者 |倒序浏览

第一部分：
-----------------
正则表达式(REs)通常被错误地认为是只有少数人理解的一种神秘语言。在表面上它们确实看起来杂乱无章，如果你不知道它的语法，那么它的代码在你眼里只是一堆文字垃圾而已。实际上，正则表达式是非常简单并且可以被理解。读完这篇文章后，你将会通晓正则表达式的通用语法。

支持多种平台

正则表达式最早是由数学家Stephen Kleene于1956年提出，他是在对自然语言的递增研究成果的基础上提出来的。具有完整语法的正则表达式使用在字符的格式匹配方面上，后来被应用到熔融信息技术领域。自从那时起，正则表达式经过几个时期的发展，现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。

正则表达式并非一门专用语言，但它可用于在一个文件或字符里查找和替代文本的一种标准。它具有两种标准：基本的正则表达式(BRE)，扩展的正则表达式(ERE)。ERE包括BRE功能和另外其它的概念。

许多程序中都使用了正则表达式，包括xsh,egrep,sed,vi以及在UNIX平台下的程序。它们可以被很多语言采纳，如HTML 和XML，这些采纳通常只是整个标准的一个子集。

比你想象的还要普通
随着正则表达式移植到交叉平台的程序语言的发展，这的功能也日益完整，使用也逐渐广泛。网络上的搜索引擎使用它，e-mail程序也使用它，即使你不是一个UNIX程序员，你也可以使用规则语言来简化你的程序而缩短你的开发时间。

正则表达式101
很多正则表达式的语法看起来很相似，这是因为你以前你没有研究过它们。通配符是RE的一个结构类型，即重复操作。让我们先看一看ERE标准的最通用的基本语法类型。为了能够提供具有特定用途的范例，我将使用几个不同的程序。

第二部分：
----------------------
字符匹配

正则表达式的关键之处在于确定你要搜索匹配的东西，如果没有这一概念，Res将毫无用处。

每一个表达式都包含需要查找的指令，如表A所示。

Table A: Character-matching regular expressions
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
.
Match any one character
grep .ord sample.txt
Will match “ford”, “lord”, “2ord”, etc. in the file sample.txt.
-----------------
[ ]
Match any one character listed between the brackets
grep [cng]ord sample.txt
Will match only “cord”, “nord”, and “gord”
---------------------
[^ ]
Match any one character not listed between the brackets

grep [^cn]ord sample.txt
Will match “lord”, “2ord”, etc. but not “cord” or “nord”

grep [a-zA-Z]ord sample.txt
Will match “aord”, “bord”, “Aord”, “Bord”, etc.

grep [^0-9]ord sample.txt
Will match “Aord”, “aord”, etc. but not “2ord”, etc.

重复操作符
重复操作符，或数量词，都描述了查找一个特定字符的次数。它们常被用于字符匹配语法以查找多行的字符，可参见表B。

Table B: Regular expression repetition operators
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
?
Match any character one time, if it exists
egrep “?erd” sample.txt
Will match “berd”, “herd”, etc. and “erd”
------------------
*
Match declared element multiple times, if it exists
egrep “n.*rd” sample.txt
Will match “nerd”, “nrd”, “neard”, etc.
-------------------
+
Match declared element one or more times
egrep “[n]+erd” sample.txt
Will match “nerd”, “nnerd”, etc., but not “erd”
--------------------
{n}
Match declared element exactly n times
egrep “[a-z]{2}erd” sample.txt
Will match “cherd”, “blerd”, etc. but not “nerd”, “erd”, “buzzerd”, etc.
------------------------
{n,}
Match declared element at least n times
egrep “.{2,}erd” sample.txt
Will match “cherd” and “buzzerd”, but not “nerd”
------------------------
{n,N}
Match declared element at least n times, but not more than N times
egrep “n[e]{1,2}rd” sample.txt
Will match “nerd” and “neerd”

第三部分：
----------------
锚
锚是指它所要匹配的格式，如图C所示。使用它能方便你查找通用字符的合并。例如，我用vi行编辑器命令:s来代表substitute，这一命令的基本语法是：

s/pattern_to_match/pattern_to_substitute/

Table C: Regular expression anchors
-------------
操作
解释
例子
结果
---------------
^
Match at the beginning of a line
s/^/blah /
Inserts “blah “ at the beginning of the line
---------------
$
Match at the end of a line
s/$/ blah/
Inserts “ blah” at the end of the line
---------------
\<
Match at the beginning of a word
s/\Inserts “blah” at the beginning of the word

egrep “\Matches “blahfield”, etc.
------------------
\>
Match at the end of a word
s/\>/blah/
Inserts “blah” at the end of the word

egrep “\>blah” sample.txt
Matches “soupblah”, etc.
---------------
\b
Match at the beginning or end of a word
egrep “\bblah” sample.txt
Matches “blahcake” and “countblah”
-----------------
\B
Match in the middle of a word
egrep “\Bblah” sample.txt
Matches “sublahper”, etc.

间隔

Res中的另一可便之处是间隔(或插入)符号。实际上，这一符号相当于一个OR语句并代表|符号。下面的语句返回文件sample.txt中的“nerd” 和 “merd”的句柄：

egrep “(n|m)erd” sample.txt

间隔功能非常强大，特别是当你寻找文件不同拼写的时候，但你可以在下面的例子得到相同的结果：

egrep “[nm]erd” sample.txt

当你使用间隔功能与Res的高级特性连接在一起时，它的真正用处更能体现出来。

第四部分：
----------------
一些保留字符
Res的最后一个最重要特性是保留字符(也称特定字符)。例如，如果你想要查找“ne*rd”和“ni*rd”的字符，格式匹配语句“n[ei]*rd”与“neeeeerd” 和 “nieieierd”相符合，但并不是你要查找的字符。因为‘*’(星号)是个保留字符，你必须用一个反斜线符号来替代它，即：“n[ei]\*rd”。其它的保留字符包括：

^ (carat)
. (period)
[ (left bracket}
$ (dollar sign)
( (left parenthesis)
) (right parenthesis)
| (pipe)
* (asterisk)
+ (plus symbol)
? (question mark)
{ (left curly bracket, or left brace)
\ backslash
一旦你把以上这些字符包括在你的字符搜索中，毫无疑问Res变得非常的难读。比如说以下的PHP中的eregi搜索引擎代码就很难读了。

eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$",$sendto)

你可以看到，程序的意图很难把握。但如果你抛开保留字符，你常常会错误地理解代码的意思。

总结
在本文中，我们揭开了正则表达式的神秘面纱，并列出了ERE标准的通用语法。如果你想阅览Open Group组织的规则的完整描述，你可以参见：Regular Expressions，欢迎你在其中的讨论区发表你的问题或观点。

另外一篇文章
----------------------------------------
正则表达式和Java编程语言
-----------------------------------------
类和方法

下面的类根据正则表达式指定的模式，与字符序列进行匹配。

Pattern类

Pattern类的实例表示以字符串形式指定的正则表达式，其语法类似于Perl所用的语法。

用字符串形式指定的正则表达式，必须先编译成Pattern类的实例。生成的模式用于创建Matcher对象，它根据正则表达式与任意字符序列进行匹配。多个匹配器可以共享一个模式，因为它是非专属的。

用compile方法把给定的正则表达式编译成模式，然后用 matcher方法创建一个匹配器，这个匹配器将根据此模式对给定输入进行匹配。pattern 方法可返回编译这个模式所用的正则表达式。

split方法是一种方便的方法，它在与此模式匹配的位置将给定输入序列切分开。下面的例子演示了：

/*
* 用split对以逗号和/或空格分隔的输入字符串进行切分。
*/
import java.util.regex.*;

public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[,\\s]+");
// Split input with the pattern
String[] result =
p.split("one,two, three four , five");
for (int i=0; iSystem.out.println(result);# E( X$ }* i0 V% p) {7 K
}
9 }6 e7 a: m# e! s}$ K8 t6 e( j: p, x% l* |

; r& v/ X: G. ^9 j4 CMatcher类 ; U" Y! f: P+ o* n. e

# \6 n, `# e, OMatcher类的实例用于根据给定的字符串序列模式，对字符序列进行匹配。使用CharSequence接口把输入提供给匹配器，以便支持来自多种多样输入源的字符的匹配。0 r/ x$ Q! }% y* F' c* T
! {3 l" S1 j0 ~: N  I
通过调用某个模式的matcher方法，从这个模式生成匹配器。匹配器创建之后，就可以用它来执行三类不同的匹配操作：. n& i& k+ R$ z  o4 E) H4 Z1 M
2 S  K5 r2 Y5 J: V" V
matches方法试图根据此模式，对整个输入序列进行匹配。 + I2 n9 j2 g+ |) R2 m5 X) E1 v+ Q
lookingAt方法试图根据此模式，从开始处对输入序列进行匹配。 0 l- j) b8 R+ R! C
find方法将扫描输入序列，寻找下一个与模式匹配的地方。
7 ?$ Y1 V  j% A" R( h# R4 _0 N- X
这些方法都会返回一个表示成功或失败的布尔值。如果匹配成功，通过查询匹配器的状态，可以获得更多的信息5 O' p# w6 W0 G! u" ~/ H
7 Q( _: ?3 @/ z9 V( J, ~
这个类还定义了用新字符串替换匹配序列的方法，这些字符串的内容如果需要的话，可以从匹配结果推算得出。
+ ]; s- ^2 E, H6 O" t" |' Y- x+ x8 y6 T, ?* w9 R; `- s
appendReplacement方法先添加字符串中从当前位置到下一个匹配位置之间的所有字符，然后添加替换值。appendTail添加的是字符串中从最后一次匹配的位置之后开始，直到结尾的部分。
$ O: J8 ~* m% c% O, M7 C
% N" x2 t' X+ S$ ~. n例如，在字符串blahcatblahcatblah中，第一个 appendReplacement添加blahdog。第二个 appendReplacement添加blahdog，然后 appendTail添加blah，就生成了： blahdogblahdogblah。请参见示例简单的单词替换。6 k7 E* M0 A( a( }6 X- D

& S* ?4 S- _, n* ]5 c2 E, aCharSequence接口* ^0 W' V* O" i( `# T1 r) a3 O

$ A1 o" ~$ \! h* ?CharSequence接口为许多不同类型的字符序列提供了统一的只读访问。你提供要从不同来源搜索的数据。用String, StringBuffer 和CharBuffer实现CharSequence,，这样就可以很容易地从它们那里获得要搜索的数据。如果这些可用数据源没一个合适的，你可以通过实现CharSequence接口，编写你自己的输入源。* z6 G( }, p. H4 ]+ y' a2 s  t

' W" y) ]2 B. G( L( m( wRegex情景范例
3 A) ^6 Z; Z0 E* j
5 e4 Z/ z, B8 k6 a以下代码范例演示了java.util.regex软件包在各种常见情形下的用法：5 R2 q0 k: p- b9 v  R

6 e* ^& l! k& G/ ~简单的单词替换
, p* t8 S- q4 b. D4 W& D0 P) e+ ~2 H1 t( n2 S, Q
/*: S; }2 r* m" ^; P% [9 Y
* This code writes "One dog, two dogs in the yard."( K7 w3 H0 x, Q& T4 k2 [1 j1 t
* to the standard-output stream:- B8 P- D8 l) n
*/( X! _9 E/ {, f; e3 l
import java.util.regex.*;9 p+ S( K0 n: i1 S/ y2 R4 ?1 w7 ?, @

1 H0 J7 B$ z% S" cpublic class Replacement {* l* j( w+ O" S7 W
public static void main(String[] args)
7 L, H5 j; ]* e7 s4 M% F: B    throws Exception {
$ W. G" L: Q& J0 I- T! v/ ^// Create a pattern to match cat$ Y$ d% x; X8 \! T2 D4 d+ d+ T- g
Pattern p = Pattern.compile("cat");5 n& l0 q2 J: M& ?
// Create a matcher with an input string
Matcher m = p.matcher("one cat," +  b2 c' B* V+ v  G
   " two cats in the yard");+ v* [3 {" n/ P- ^7 Q9 m2 X, U- F
StringBuffer sb = new StringBuffer();7 o# M3 D4 [) R. Z3 W
boolean result = m.find();1 s3 a  W. W# p0 G: r; |, z
// Loop through and create a new String
8 p7 W8 u, J6 G+ B// with the replacements4 r5 m0 x) b$ x% q3 `" J) t3 g
while(result) {% Y2 R1 Y" v5 _0 A8 b6 b
m.appendReplacement(sb, "dog");. Y3 l- O& r0 s2 }& N# y* V8 y. q& T, e
result = m.find();& l/ v3 O0 [8 {0 \
}: ^7 D3 b5 V/ }
// Add the last segment of input to . g2 X. K* f& U9 s6 S/ w4 @+ `
// the new String; ~6 y2 P* H1 a4 {0 A
m.appendTail(sb);
7 D% W$ ?9 k- p: m1 K% U: ^2 pSystem.out.println(sb.toString());
% q: j! _3 J5 ^7 ]% ?2 _}
- B0 }5 i  v( e+ i* T9 @% s}" t5 m" a) ]3 G7 g" j! r1 d8 x  j2 P
: w9 U6 j7 ~  O  f+ x
电子邮件确认* _; C5 K8 i; a1 M$ @; e
' c* }1 Y8 r3 ^6 O
以下代码是这样一个例子：你可以检查一些字符是不是一个电子邮件地址。它并不是一个完整的、适用于所有可能情形的电子邮件确认程序，但是可以在需要时加上它。
# A+ M  ~, N' u3 f- y* d: a# d, e! r# [/ x/ K) v) P* w7 l
/*) B5 _6 H' T9 \' c$ f5 x2 Z# A) z) a- B- B
* Checks for invalid characters& s/ e% U+ V; e( V4 C! A' G/ `
* in email addresses$ H5 \' G* P$ c( ?. J
*/
/ D" D4 ^7 B) Npublic class EmailValidation {, w0 I/ r; z% j! c) c
public static void main(String[] args)
# n& I/ m6 h* G+ x) X, j          throws Exception {
& h/ @2 A: h8 R" Z& J+ Q          # C  |/ n; b9 Q
String input = "@sun.com";& N  L6 T0 m% C" I7 G
//Checks for email addresses starting with
! B5 }9 I0 O+ o* R//inappropriate symbols like dots or @ signs.. _* n* g9 g1 ~7 h% p3 e
Pattern p = Pattern.compile("^\\.|^\\@");
0 c8 @& X% h  l" Q, R( O8 ^0 }Matcher m = p.matcher(input);
- L) H' j9 a  lif (m.find())
. N" ]5 P% V. C) _; f7 h: y" h) vSystem.err.println("Email addresses don't start" +
3 o# {8 \4 B3 J: P+ @' Q       " with dots or @ signs.");
& l1 ?" y/ J6 B9 f$ d  _3 s//Checks for email addresses that start with3 b' J" E- E* p; t( F7 [
//www. and prints a message if it does.
0 G& l8 k. J- ]6 u) d3 s6 J& xp = Pattern.compile("^www\\.");0 M) v: e  m  L' E. [( B
m = p.matcher(input);
: s& u. H, Q9 a# Z# U8 T7 Hif (m.find()) {
0 V3 [' }& v; u$ a$ ESystem.out.println("Email addresses don't start" +
1 V( c' x1 F- @2 ~. O " with \"www.\", only web pages do.");
1 E( X5 j, \( ]0 f+ _3 R& F}( A, w/ F+ D; N( C: D4 i
p = Pattern.compile("[^A-Za-z0-9\\.\\@_\\-~#]+");
- H- Y, j8 D" bm = p.matcher(input);7 K) a. M* f" d$ U" I
StringBuffer sb = new StringBuffer();9 ^6 Y$ @( Q# S, L: c! s
boolean result = m.find();
9 K& @4 f: q2 J0 s: c2 y8 @boolean deletedIllegalChars = false;
0 @9 |5 `/ O) M
8 O  t: z! u# Hwhile(result) {
$ y# M* t# c0 G5 o+ ydeletedIllegalChars = true;
6 ]" P& a9 J2 Nm.appendReplacement(sb, "");$ Q' ~. T9 m3 t
result = m.find();
# I. ?# A7 P/ A; U4 T}
! L) |7 c) f" [1 P+ C- J% ]0 h/ |
; u' [( d5 G5 u, O2 ]// Add the last segment of input to the new String
2 N) s/ g! ~% om.appendTail(sb);
$ U! f. T; e; N& @& X. W
1 e) W2 Y& g$ D9 U& iinput = sb.toString();2 [1 w+ K8 ~6 I( _  I0 A

. O. e2 {2 L# @. e2 k% Lif (deletedIllegalChars) {
& E3 W8 v1 @. H+ U  hSystem.out.println("It contained incorrect characters" +5 M* J6 h1 \$ D+ m  i
   " , such as spaces or commas.");
6 g- B8 q/ j; ~0 ~5 ^1 d9 p3 w}
, a2 [3 w$ P1 }  l) V) Y5 i}
* z; Q& d2 Z+ w}5 `' s; w/ H; I% \# I

  t# ]8 U3 ?' B. ^! _. T从文件中删除控制字符
. q8 O' O- W4 E) \/ {! |  L! g$ R, _% M' X9 F$ Z, T
/* This class removes control characters from a named2 ^5 j6 b. G9 w% W2 J" R$ E
* file.
1 E6 z' J" C3 O*/0 w; N, v% j$ {( K0 d; M
import java.util.regex.*;) y- r6 P9 F0 g- L: p  }; H
import java.io.*;

( t& Q+ I2 T( _3 n: Mpublic class Control {' t$ W8 E& c" J, w2 r! C  X
public static void main(String[] args)
* v, ?  N2 B7 `9 w+ e0 m% [          throws Exception {4 G, A9 y: T! M+ [# c+ n
         6 r7 Y7 `# h% C$ S1 A$ F" @
//Create a file object with the file name
% V' A$ I( Y! I# i  V. [" B1 Y3 F//in the argument:
2 S# h7 m5 r8 d% ~- tFile fin = new File("fileName1");% g/ O5 T1 o/ q' e! H
File fout = new File("fileName2");
1 a$ I0 n/ S! s//Open and input and output stream
) y; L$ A6 N8 {7 I. _  S# y' Q/ {FileInputStream fis = 2 ~+ N% d  b) w, {' q" H
   new FileInputStream(fin);
% ]; T6 h* R! Z+ GFileOutputStream fos = + C" d% `+ {- v; O( G4 Z9 K& W
   new FileOutputStream(fout);* L3 a$ _1 ]: Z/ N7 ?" h

, l. x, k" s; \2 C5 W7 y* ?BufferedReader in = new BufferedReader(. u. N2 k* ^& j, E% u/ {
   new InputStreamReader(fis));: @4 c3 y- w. J5 k! V  H
BufferedWriter out = new BufferedWriter(. d) @! d4 k" R. ^: |7 p
   new OutputStreamWriter(fos));
( ]9 P7 \8 D$ T
" C& B/ Q+ c! a* S/ j// The pattern matches control characters: O/ t# m- S/ p; }: i, f- B
Pattern p = Pattern.compile("{cntrl}");  }8 R  C0 @9 g) n8 C2 q( {5 x
Matcher m = p.matcher("");- H2 t9 e8 L, c, v9 W, z0 e" v& [5 `3 i
String aLine = null;
4 o" {6 w$ k$ X% e' v  Bwhile((aLine = in.readLine()) != null) {# V/ s& u: z) D+ k8 F
m.reset(aLine);
7 _0 k5 B8 b$ o% L# A5 S6 L//Replaces control characters with an empty
7 K6 ]: W4 u: q, j* A8 @5 S//string.
* |! \) a8 x! F) }! {String result = m.replaceAll("");
" \+ Y& N0 D0 |; o! l" a7 ?- dout.write(result);: e: ?0 {0 a( ]* D9 [7 ]
out.newLine();3 D6 k, g* m% H& M1 w9 Z
}. {5 k* \) A0 @7 w
in.close();- X8 ]) a9 t) ^6 d( x
out.close();# S, |: o, f6 E" j2 B, h
}
# \1 U. o+ o$ d3 N! D}
  R+ i" u0 V3 @% Y# f& z- k
  w6 o$ W7 r: P8 J( n4 }6 Z$ n文件查找
3 L0 R1 w4 }4 P4 d4 ?
' S# @: h: A9 B, `8 a' M! [4 L9 e- E/*
/ H! k# Z' l. c( E9 Z# I2 {* Prints out the comments found in a .java file.
' k1 M  l2 G% a7 e2 s; J/ q$ y*/
4 r+ S' t. y6 @7 X5 h. k: s; p' Gimport java.util.regex.*;
3 ~2 d7 |1 O7 F% f" aimport java.io.*;
  j) v0 c- d2 J; U* ~import java.nio.*;
$ N, I6 ?1 b7 b+ D1 ]( ximport java.nio.charset.*;+ A# K$ f1 c; ?  E/ _3 u) k9 v. y
import java.nio.channels.*;9 F& {7 q. _5 p5 Q/ @. t" w9 m  s* i

; N9 }7 w3 A& A" F0 wpublic class CharBufferExample {
$ c+ ~0 U% E% z8 {: `1 U1 J- f* Epublic static void main(String[] args) throws Exception {6 {8 k# Y' L8 I" U& C6 U
// Create a pattern to match comments' V  x% R/ F  A
Pattern p = 1 Z( a' x0 {6 I  s
Pattern.compile("//.*$", Pattern.MULTILINE);( d. i' {, v( Q1 J

4 g; j; h" @& D/ G5 ]// Get a Channel for the source file- q1 S6 J" a6 s
File f = new File("Replacement.java");
, S' Q( g2 G: B3 |1 QFileInputStream fis = new FileInputStream(f);9 Y* W5 b  h. ^: z
FileChannel fc = fis.getChannel();
& {. C; Y# m9 w
" B9 e$ L- D! O* Y, F. I  u; v// Get a CharBuffer from the source file- g' v) G$ M  G: z5 V
ByteBuffer bb = 4 L5 @$ ?1 {" _3 N, U" \  j
fc.map(FileChannel.MAP_RO, 0, (int)fc.size());- o' k8 Q+ }- R. T
Charset cs = Charset.forName("8859_1");$ W. u% Z0 E% p2 n# |, V% f. h
CharsetDecoder cd = cs.newDecoder();( |+ t+ U( r, C* e* ^' s) k4 ^
CharBuffer cb = cd.decode(bb);
0 q# k, U* g7 b; B! e) K6 k6 @6 r' e+ ^  P0 w% N* O" u
// Run some matches8 G9 T: `% S1 ]$ k
Matcher m = p.matcher(cb);
! Z& ?+ X+ D# I) v$ Q5 K8 ~9 _while (m.find())
8 b( {4 l2 N2 ]9 d4 S. OSystem.out.println("Found comment: "+m.group());: T) u) b. C: k- [
}! {! L0 k+ B. Y0 }
}
9 @1 |% J  X6 E1 C
5 X: T5 B) _. j* G结论
! c% f& o0 v5 B7 y, E现在Java编程语言中的模式匹配和许多其他编程语言一样灵活了。可以在应用程序中使用正则表达式，确保数据在输入数据库或发送给应用程序其他部分之前，格式是正确的，正则表达式还可以用于各种各样的管理性工作。简而言之，在Java编程中，可以在任何需要模式匹配的地方使用正则表达式。
6 E3 B( m' M! H3 ?1 r* g* k8 y4 r$ [8 U& g9 f* B6 Q
JDK1.4之正規表示式
- b+ T* H' ?' R5 n( z$ o9 x: {' {" {written by william chen(06/19/2002). \; g; z5 ]: [5 V  {, L
+ a9 z5 T3 Z* `! ~( R9 s8 Z
--------------------------------------------------------------------------------
! V& u2 a7 k2 w; ]- l1 E/ ?' y/ `& _7 A1 |) a5 H; H
什麼是正規表示式呢(Reqular Expressions)
8 [: ?8 @8 r% W2 T- B. W. @- b. T6 D
就是針對檔案、字串，透過一種很特別的表示式來作search與replace
5 r. D; Z2 U( p& D' c
. |# `" O2 l: J因為在unix上有很多系統設定都是存放在文字檔中，因此網管或程式設計常常需要作搜尋與取代
& v& I3 q, m: I" U
+ R; T# B  C' p. N9 Y2 N$ S; C所以發展出一種特殊的命令叫做正規表示式
. k, m" x, R$ K5 t* q
- X+ ^. P2 M5 f% d& _7 P1 ]我們可以很簡單的用 "s/6 B" m. Q# n! a( D- R
因此jdk1.4提供了一組正規表示式的package供大家使用
$ K9 \" G2 e: J
) \' |) l4 I& _2 ]若是jdk1.4以下的可以到http://jakarta.apache.org/oro取得相關功能的package. u6 y4 k% \9 q$ J$ j, }
' `; e% t4 b0 p1 Q/ b+ ^, \
剛剛列出的一串符號" s/% T) X1 h. P  [$ r( p* K$ T3 Q
適用於j2sdk1.4的正規語法  r2 \4 m9 V2 Z6 Q4 \* R& _$ \' t5 p

8 A3 u" E/ W% t3 ^"." 代表任何字元
% d) x. z' R9 o7 j6 Y% r9 [
, W- t' N( l$ X- {- d正規式原字串符合之字串
; ?1 h) n) I, R; S3 d7 e. ab a $ A2 W+ d7 b6 r# b6 `* o! w5 h, m% O
.. abc ab 0 ?2 L- X3 h: v; ]/ Q/ ?

/ b, k( L. K9 R. g+ a7 l* z"+" 代表一個或以個以上的字元+ a  w# n. l+ M9 q" j" r- ?
"*" 代表零個或是零個以上的字元; L  j, c! j; \, D' m9 Y$ Q2 B$ [: ?' ~
- z- Z1 n: U( R* n7 s1 P- G
正規式原字串符合之字串 $ c6 T: E9 z) G- e2 x
+ ab ab , ]8 O3 B3 A5 ^
* abc abc ) ^0 ^+ A/ ?& R' l6 s( U

' E- X% g  p4 ?4 [7 q"( )"群組: D* H6 p$ z+ U7 n
9 r) ]: S5 a6 I9 ]
正規式原字串符合之字串 ; _+ d6 I7 ?& C9 M9 U
(ab)* aabab abab
* _/ S/ }: Q8 M# z" h! S% `
* f$ g- M. e$ }% x4 K: L; c字元類
' F* P* {( C+ w( A  k
) w$ d" e4 [; N9 g4 p7 N' h( X) G正規式原字串符合之字串
- J& Y& d, J) i$ \+ v% @[a-dA-D0-9]* abczA0 abcA0 - r/ ~0 t* |0 T6 [7 E
[^a-d]* abe0 e0
0 g/ `  E' z+ u* A+ }- Q[a-d]* abcdefgh abab
6 t6 l- Q1 [+ Z3 c) t/ x' P, k1 u% a1 W
- N. _/ E! \$ ~; _/ Y$ W1 ]( B
5 W8 A5 L3 k! j+ ]5 y3 B0 ]6 R8 F簡式
( R6 {% G1 L' u5 [
( w& h5 ?0 ?6 ~; W/ _; n. C6 _\d 等於 [0-9] 數字 0 t, g0 {/ ?: h. @- I/ Y
\D 等於 [^0-9] 非數字
: c- L# J7 P" }# c" p6 l\s 等於 [ \t\n\x0B\f\r] 空白字元
7 i# k% U( m  S* [7 S) c\S 等於 [^ \t\n\x0B\f\r] 非空白字元 " V" I. Q3 b) |
\w 等於 [a-zA-Z_0-9] 數字或是英文字
8 Z1 k% O4 p; F% i! W% X\W 等於 [^a-zA-Z_0-9] 非數字與英文字 3 O. v( W/ L- P5 i5 L
+ E+ U, p8 w5 R1 {7 s4 e0 k
每一行的開頭或結尾
; N1 H/ l6 S0 k5 \/ R# c2 P  V6 V, N8 W
^ 表示每行的開頭
% z- C; a! w* u  p6 m  l9 x. T$ 表示每行的結尾- m& P% g& O, @1 @9 C0 A

5 ~" _: K* T0 Z--------------------------------------------------------------------------------
- s6 P, ~8 D* }2 v7 n, w1 Y2 s
: G6 _, T3 e: }  J$ N正規表示式 java.util.regex 相關的類別
6 X/ a" V( K* ?  ?0 O; H, c$ L
6 `# `: @3 J4 T! sPattern—正規表示式的類別
& S8 q$ f" @- [% u0 J% TMatcher—經過正規化的結果$ F+ {' P/ i2 m. _: O: O% J3 k
PatternSyntaxExpression—Exception thrown while attempting to compile a regular expression8 _  T  m: v3 Q1 r6 c

4 e9 w7 y% k: t# r5 F" Q7 z範例1: 將字串中所有符合"<"的字元取代成"lt;"
" \2 d1 J5 C/ i9 l
" T* _$ c4 H6 Z0 @& d' _7 Nimport java.io.*;
& ?7 {/ g# d/ S+ p; Q; timport java.util.regex.*;" O0 Z6 E9 U) [2 w/ r& @4 k; g
/**: @8 b  q/ u: k$ [7 w0 u/ X
* 將字串中所有符合"<"的字元取代成"lt;", o- m& E0 Z, t
*/$ K) m0 q3 I' D) O
public static void replace01(){
1 _0 U* [: T$ s7 U6 p% ^// BufferedReader lets us read line-by-line
( C  b6 o; G. \- L9 u2 ?Reader r = new InputStreamReader( System.in );
4 T8 i) h& [* }# T: p: mBufferedReader br = new BufferedReader( r );+ |* y' q6 H& A& H" t- r
Pattern pattern = Pattern.compile( "<" ); // 搜尋某字串所有符合'<'的字元2 @( @1 w7 j6 u3 \( d, ~; r- H1 w
try{( {# a4 L4 Y/ d2 T$ X
while (true) {
. P% U/ |9 P4 [4 @' d9 A( _' uString line = br.readLine();
- s- m6 Z! V) Y+ Q6 ^// Null line means input is exhausted2 j3 {& l: c' m3 y& X
if (line==null)
& \$ g  \$ c/ }" s* e  e8 |break;5 N# Y- D0 E5 r4 c7 ~. U
Matcher a = pattern.matcher(line);
1 l% j5 U8 C/ B5 w# A! I$ x( y2 z4 Owhile(a.find()){: u% x1 b' s& A0 u( M' j7 p
System.out.println("搜尋到的字元是" + a.group());
: N' h( ^8 X) e" l: b}
' @& ]5 A9 t( H$ lSystem.out.println(a.replaceAll("lt;"));// 將所有符合字元取代成lt;
9 s3 E' W3 u4 |6 T: y}
( Q. _# u( e6 m9 o8 J}catch(Exception ex){ex.printStackTrace();};, b* S, A& F0 b  J* |* B
}9 S: z" E/ ~) J/ }" F" N: z

: H: B0 N* A* H  z  Z8 z範例2:
/ J. z% h! M/ z1 X  h' ^& e/ J  u. M0 n
import java.io.*;
6 l- F; U, y, d1 M: y3 @2 r% bimport java.util.regex.*;
3 A- {& V( v4 a/ ]* I/**
& p- \( p- K9 k. L; Q4 p* 類似StringTokenizer的功能
) ]- ?7 C0 T( y$ |* 將字串以","分隔然後比對哪個token最長
5 j" Y. v& m0 e: M5 E/ V7 U1 x*/
) F3 K8 T' n" r- @7 N0 _" xpublic static void search01(){
7 R4 c' f8 U6 F, M0 C" M// BufferedReader lets us read line-by-line
" {% l+ x) n$ r5 t9 M' mReader r = new InputStreamReader( System.in );! \1 j9 k7 ?: A2 S6 P
BufferedReader br = new BufferedReader( r );9 Z$ D& m8 u1 D- B& `
Pattern pattern = Pattern.compile( ",\\s*" );// 搜尋某字串所有","的字元
' e* L  T% M2 a3 f2 Ftry{
5 r# N* U- @1 `while (true) {
# O" D7 ^; j" J) o  @$ ]String line = br.readLine();- R  Z+ t5 E1 }5 p1 t  \, @, P
String words[] = pattern.split(line);
8 o) W$ ?  Q* |9 T// Null line means input is exhausted( \* Z+ D7 f; f9 r. ?" i
if (line==null)5 Z" S1 J" g/ w* K4 F& n! c
break;( N8 q/ w; j- B  V" c1 {+ n
// -1 means we haven't found a word yet
& B6 T% i7 L: K- N5 Z$ L7 d! Aint longest=-1;
& C* Q2 a$ v) s, ?; i9 nint longestLength=0;
2 D( S1 O8 n2 o1 F3 X7 Cfor (int i=0; iSystem.out.println("分段:" + words );
0 d# z" h$ p* }5 wif (words.length() > longestLength) {& S9 W" A6 m' r# k
longest = i;
8 f! q- {. u. ]3 B" P! l5 }/ H% YlongestLength = words.length();
% w$ f% x2 ]+ y}5 l5 {6 u3 P. ^; S
}* H: }' K. S( ]9 b
System.out.println( "長度最長為:" + words[longest] );
9 w% `+ m9 ~. Q}* f6 X( @; ~& l1 u4 B' D9 L
}catch(Exception ex){ex.printStackTrace();};/ c1 |$ W  M. U. B
}, Y+ C5 Y4 L3 j/ K8 d

4 s  V& c0 |3 S& `; g) y--------------------------------------------------------------------------------
2 p' I8 F+ a0 |1 Y/ k# ], X4 x, I$ g" A
其他的正規語法' M0 ^' o+ z( V

/ u9 b9 o: }' _7 l( [, X" m: ?+ U/^\s* # 忽略每行開始的空白字元
# [+ c5 G' v& c$ N. k2 h(M(s|r|rs)\.) # 符合 Ms., Mrs., and Mr. (titles)