Author: fperrad Date: Tue Feb 13 01:09:31 2007 New Revision: 16964 Modified: trunk/languages/lua/lib/luaregex.pir trunk/languages/lua/t/string.t
Log: [Lua] - regex : add doc & test Modified: trunk/languages/lua/lib/luaregex.pir ============================================================================== --- trunk/languages/lua/lib/luaregex.pir (original) +++ trunk/languages/lua/lib/luaregex.pir Tue Feb 13 01:09:31 2007 @@ -10,6 +10,170 @@ See "Lua 5.1 Reference Manual", section 5.4.1 "Patterns", L<http://www.lua.org/manual/5.1/manual.html#5.4.1>. +=head2 Character Class: + +A I<character class> is used to represent a set of characters. The following +combinations are allowed in describing a character class: + +=over 4 + +=item B<x> + +(where I<x> is not one of the I<magic characters> C<^$()%.[]*+-?)> represents +the character I<x> itself. + +=item B<.> + +(a dot) represents all characters. + +=item B<%a> + +represents all letters. + +=item B<%c> + +represents all control characters. + +=item B<%d> + +represents all digits. + +=item B<%l> + +represents all lowercase letters. + +=item B<%p> + +represents all punctuation characters. + +=item B<%s> + +represents all space characters. + +=item B<%u> + +represents all uppercase letters. + +=item B<%w> + +represents all alphanumeric characters. + +=item B<%x> + +represents all hexadecimal digits. + +=item B<%z> + +represents the character with representation 0. + +=item B<%x> + +(where I<x> is any non-alphanumeric character) represents the character I<x>. +This is the standard way to escape the magic characters. Any punctuation +character (even the non magic) can be preceded by a C<'%'> when used to +represent itself in a pattern. + +=item B<[set]> + +represents the class which is the union of all characters in I<set>. A range of +characters may be specified by separating the end characters of the range with +a C<'-'>. All classes C<%x> described above may also be used as components in +I<set>. All other characters in I<set> represent themselves. For example, +C<[%w_]> (or C<[_%w]>) represents all alphanumeric characters plus the +underscore, C<[0-7]> represents the octal digits, and C<[0-7%l%-]> represents +the octal digits plus the lowercase letters plus the C<'-'> character. + +The interaction between ranges and classes is not defined. Therefore, patterns +like C<[%a-z]> or C<[a-%%]> have no meaning. + +=item B<[^set]> + +represents the complement of I<set>, where I<set> is interpreted as above. + +=back + +For all classes represented by single letters (C<%a>, C<%c>, etc.), the +corresponding uppercase letter represents the complement of the class. For +instance, C<%S> represents all non-space characters. + +The definitions of letter, space, and other character groups depend on the +current locale. In particular, the class C<[a-z]> may not be equivalent to +C<%l>. + +=head2 Pattern Item: + +A I<pattern item> may be + +=over 4 + +=item * + +a single character class, which matches any single character in the class; + +=item * + +a single character class followed by C<'*'>, which matches 0 or more +repetitions of characters in the class. These repetition items will always +match the longest possible sequence; + +=item * + +a single character class followed by C<'+'>, which matches 1 or more +repetitions of characters in the class. These repetition items will always +match the longest possible sequence; + +=item * + +a single character class followed by C<'-'>, which also matches 0 or more +repetitions of characters in the class. Unlike C<'*'>, these repetition items +will always match the I<shortest> possible sequence; + +=item * + +a single character class followed by C<'?'>, which matches 0 or 1 +occurrence of a character in the class; + +=item * + +C<%n>, for I<n> between 1 and 9; such item matches a substring equal to +the i<n>-th captured string (see below); + +=item * + +C<%bxy>, where I<x> and I<y> are two distinct characters; such item +matches strings that start with I<x>, end with I<y>, and where the I<x> and +I<y> are I<balanced>. This means that, if one reads the string from left to +right, counting I<+1> for an I<x> and I<-1> for a I<y>, the ending I<y> is the +first I<y> where the count reaches 0. For instance, the item C<%b()> matches +expressions with balanced parentheses. + +=back + +=head2 Pattern: + +A I<pattern> is a sequence of pattern items. A C<'^'> at the beginning of a +pattern anchors the match at the beginning of the subject string. A C<'$'> at +the end of a pattern anchors the match at the end of the subject string. At +other positions, C<'^'> and C<'$'> have no special meaning and represent +themselves. + +=head2 Captures: + +A pattern may contain sub-patterns enclosed in parentheses; they describe +I<captures>. When a match succeeds, the substrings of the subject string that +match captures are stored (I<captured>) for future use. Captures are numbered +according to their left parentheses. For instance, in the pattern +C<"(a*(.)%w(%s*))">, the part of the string matching C<"a*(.)%w(%s*)"> is +stored as the first capture (and therefore has number 1); the character +matching C<"."> is captured with number 2, and the part matching C<"%s*"> has +number 3. + +As a special case, the empty capture C<()> captures the current string +position (a number). For instance, if we apply the pattern C<"()aa()"> on the +string C<"flaaap">, there will be two captures: 3 and 5. + +A pattern cannot contain embedded zeros. Use C<%z> instead. + =head1 HISTORY Mostly taken from F<compilers/pge/PGE/P5Regex.pir>. Modified: trunk/languages/lua/t/string.t ============================================================================== --- trunk/languages/lua/t/string.t (original) +++ trunk/languages/lua/t/string.t Tue Feb 13 01:09:31 2007 @@ -101,12 +101,16 @@ print(string.find(s, "W.rld")) print(string.find(s, "^(h.ll.)")) print(string.find(s, "^(h.)l(l.)")) +s = "Deadline is 30/05/1999, firm" +date = "%d%d/%d%d/%d%d%d%d" +print(string.sub(s, string.find(s, date))) CODE 1 5 7 11 nil 1 5 hello 1 5 he lo +30/05/1999 OUTPUT language_output_is( 'lua', << 'CODE', << 'OUTPUT', 'function string.format' ); @@ -237,6 +241,12 @@ date = "Today is 17/7/1990" d = string.match(date, "%d+/%d+/%d+") print(d) +d, m, y = string.match(date, "(%d+)/(%d+)/(%d+)") +print(d, m, y) +print(string.match("The number 1298 is even", "%d+")) +pair = "name = Anna" +key, value = string.match(pair, "(%a+)%s*=%s*(%a+)") +print(key, value) CODE hello world @@ -244,6 +254,9 @@ hello he lo 17/7/1990 +17 7 1990 +1298 +name Anna OUTPUT language_output_is( 'lua', << 'CODE', << 'OUTPUT', 'function string.rep' );