lua string函数

lua的string函数：

    参数中的index从1开始，负数的意义是从后开始往前数，比如-1代表最后一个字母
    对于string类型的值，可以使用OO的方式处理，如string.byte(s.i)可以被写成s:byte(i)
    It also sets a metatable for strings where the __index field points to the string table. Therefore, you can use the string functions in object-oriented style. For instance, string.byte(s, i) can be written as s:byte(i). 
 

    函数列表 
--------------------------------------------------------------------------------
    string.byte 将字符串中的字母转换为数字
    string.char 将数字转换成字符串
    
    string.sub(s, i [, j]) 返回字符串s的子串，i是开始位置，j是结束位置。i和j可以是负数。 
    string.sub(s, 1, j)返回s的前缀； 
    string.sub(s, -i)返回s的后缀。
    
    string.rep(s, n)复制字符串n次，返回拷贝
    string.reverse(s) s中字符的顺序颠倒，返回
    
    string.format(formatstring, ...) 
    这个函数跟C语言的printf很像，不同之处在于不支持*,l,L,n,p,h这几种转义码，并且额外加了一个转义码为%q。 
    这个%q据说可以把字符串格式化为可以安全地被lua编译器读取的格式，替换掉其中的引号、转义字符什么的，不太清楚用法。 
    参考文章：http://www.cnblogs.com/whiteyun/archive/2009/08/07/1540899.html
    
    string.len(s) 返回字符串长度
    string.lower(s) 大写转小写
    string.upper(s) 小写转大写
    
    string.dump 返回一个string，代表了函数的二进制码，返回的这个string可以被loadstring执行。可以实现函数序列化，函数可以传递了，甚至把函数传递到另一个进程都可以的。可以实现在其他作用域访问不同的地方的函数。


    模式匹配 
--------------------------------------------------------------------------------
    string.find (s, pattern [, init [, plain]]) 
    pattern    没有特殊字符的话就是简单搜索，有的话就是模式匹配 
    init       指定搜索的初始位置 
    plain      设定为true的话，关闭模式匹配，pattern中的字符不再被视为特殊字符，此时init必须被指定。如果不需要模式匹配功能的话，最好把这个参数设定为true，否则匹配串中包含特殊字符的话可能会达不到预期的效果
    一般的使用很简单，当pattern中包含了一些特殊字符的时候，find具有了“模式匹配”的功能，可以根据pattern匹配目标串中符合模式的字符串。
    
    Tips: 当我们想查找目标串中的所有匹配子串的时候，可以使用init参数循环搜索，每一次从前一次匹配的结束位置开始： 
    -- 查找字符串中所有新行的位置 
    local str = "hello world
haha
nihaoma" 
    local t = {} 
    local i = 0 
    while true do 
        i = string.find(str, "
", i+1) 
        if i == nil then break end 
        table.insert(t, i) 
    end
    
    
    pattern 详解: 
    Character Class: 字符类，代表符合某种条件的字符，单个字符类只表示一个字符 
    x     字符自身        (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself. 
    .     任意字符        (a dot) represents all characters. 
    %a    字母            represents all letters. 
    %c    控制字符        represents all control characters. 
    %d    数字            represents all digits. 
    %l    小写字母        represents all lowercase letters. 
    %p    标点字符        represents all punctuation characters. 
    %s    空白符          represents all space characters. 
    %u    大写字母        represents all uppercase letters. 
    %w    字母和数字      represents all alphanumeric characters. 
    %x    十六进制数字    represents all hexadecimal digits. 
    %z    代表0的字符     represents the character with representation 0. 
    [set] 使用方括号某些字符形成一个自定义的CharacterClass（Lua称之为char-set，就是指传统正则表达式概念中的括号表达式）。可以使用"-"来表示两个字符之间的范围，比如[0-7]表示一个0到8的字符。可以在[set]中包含别的Character Class，比如[%d%l]或者[%l%d]表示小写字母和数字；可以在[set]中直接添加字符，比如[01]表示二进制数。[^set]表示了这个字符集的补集，比如[^%p]表示所有不是标点的字符。
    
    Tips:上面Character Class的大写形式表示小写所代表的集合的补集。例如，'%A'表示非字母的字符。 
    Lua的字符类依赖于本地环境，所以 '[a-z]' 可能与 '%l' 表示的字符集不同，需酌情使用。在一般情况下，后者包括'ç' 和 'ã'，而前者没有。应该尽可能的使用后者来表示字母，除非出于某些特殊考虑，因为后者更简单、方便、更高效。
    
    
    Pattern Item: 跟在Character Class后面，修饰符，指定匹配Character Class 多次 
    a single character class, which matches any single character in the class; 
    a single character class followed by '*', which matches 0 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence; 
    a single character class followed by '-', which also matches 0 or more repetitions of characters in the class. Unlike '*', these repetition items will always match the shortest possible sequence; 
    a single character class followed by '+', which matches 1 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence; 
    a single character class followed by '?', which matches 0 or 1 occurrence of a character in the class; 
    比如匹配C语言中的注释： 
    "/%*.*%*/" 最长匹配 
    "/%*.-%*/" 最短匹配 
    比如匹配lua中的标识符： 
    "[_%a][_%w]*" 
    
    
    magic characters ：指 ^$()%.[]*+-? 等特殊字符 
    %       特殊字符的转义字符，如%%表示百分号，%.表示点，等等 
    ^和$    ^放在pattern开头，表示只匹配目标串的开头部分；$放在pattern的结尾，表示只匹配目标串的结尾部分 
    A pattern is a sequence of pattern items. A '^' at the beginning of a pattern anchors the match at the beginning of the subject string. A '$' at the end of a pattern anchors the match at the end of the subject string. At other positions, '^' and '$' have no special meaning and represent themselves. 
    比如去除字符串首位的空格 
    function trim (s) 
      return (string.gsub(s, "^%s*(.-)%s*$", "%1")) 
    end
    
    
    Captures: 
    A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture (and therefore has number 1); the character matching "." is captured with number 2, and the part matching "%s*" has number 3. 
    匹配结果中使用()括号包含起来的部分将被“捕获”保存起来，相当于是保存了匹配结果中的一部分，通过这种方式可以对匹配结果中的一部分进行替换取值等操作。捕获的Captures使用数字编号，编号规则应该是从最外层最左边开始。 
    如果pattern中指定了捕获，则find的返回值中将包含Captures
    比如获取字符串中的key,value：
    pair = "name = Anna"
    _, _, key, value = string.find(pair, "(%a+)%s*=%s*(%a+)")
    print(key, value)    --> name  Anna
    比如把搜到的单词两两替换位置：
    x = string.gsub("hello world from Lua", "(%w+)%s*(%w+)", "%2 %1") 
    --> x="world hello Lua from"
    
    
    向前引用：我们可以在模式中使用向前引用，'%n'（n代表1-9的数字）表示第d个捕获的拷贝。 
    看个例子，假定你想查找一个字符串中单引号或者双引号引起来的子串，你可能使用模式 '["'].-["']'，但是这个模式对处理类似字符串 "it's all right" 会出问题。为了解决这个问题，可以使用向前引用，使用捕获的第一个引号来表示第二个引号： 
    s = [[then he said: "it's all right"!]] 
    a, b, c, quotedPart = string.find(s, "(["'])(.-)%1") 
    print(quotedPart)    --> it's all right 
    print(c)            --> "  
     
    
    问题： 
    怎样匹配指定数量的字符？比如匹配密码长度为8-16位的字母或数字
    
    
    
    另外的几个模式匹配函数
--------------------------------------------------------------------------------
    string.match(s, pattern, [, init]) 
    返回第一个匹配pattern的Captures，查找不到返回nil。init指定查找的开始位置。如果没有指定Captures，则整个匹配结果被作为Captures返回 
    这个函数跟find功能很像，只不过match返回的是匹配结果而find返回的是位置
    
    string.gmatch (s, pattern)
    string.find只匹配一次，gmatch给出符合匹配的所有字符串 
    返回一个迭代函数，每次调用这个迭代函数返回从s中根据pattern捕获的匹配串。如果pattern里没有使用括号()进行捕获的话，返回是符合匹配的整个字符串 
    Returns an iterator function that, each time it is called, returns the next captures from pattern over string s. If pattern specifies no captures, then the whole match is produced in each call. 
    未指定Captures的例子，获取一个字符串中的所有单词： 
    s = "hello world from Lua" 
    for w in string.gmatch(s, "%a+") do 
        print(w) 
    end 
    指定了Captures的例子，获取一个字符串中的所有key-value，并存入表中： 
    t = {} 
    s = "from=world, to=Lua" 
    for k, v in string.gmatch(s, "(%w+)=(%w+)") do 
        t[k] = v 
    end
    
    string.gsub (s, pattern, repl, [, n]) 
    搜索s中符合pattern的所有字串，使用repl替换，返回替换结果的拷贝。n用来指定替换的个数，默认是全部匹配都替换，如果n为2，则只替换前两个。 
    repl可以是string,function,table： 
    repl是string: 使用string替换，需要注意的是string中可以有Captures，具体看例子就明白。 
    repl是table:  table作为一个key-value用于查询，捕获到的第一个Captures作为key，key处的值将被替换为table中指定的value。 
    repl是functions: 每次匹配成功时都会调用这个function，使用其返回值进行替换。每次调用返回值时会按顺序将所有捕获的子串作为参数按顺序传给这个function 
    如果table或者function返回的值是nil或者false，替换将不会发生 
    例子： 
    x = string.gsub("hello world", "(%w+)", "%1 %1") 
    --> x="hello hello world world"
    x = string.gsub("hello world", "%w+", "%0 %0", 1) 
    --> x="hello hello world"
    x = string.gsub("hello world from Lua", "(%w+)%s*(%w+)", "%2 %1") 
    --> x="world hello Lua from"
    x = string.gsub("home = $HOME, user = $USER", "%$(%w+)", os.getenv) 
    --> x="home = /home/roberto, user = roberto"
    x = string.gsub("4+5 = $return 4+5$", "%$(.-)%$", function (s) 
        return loadstring(s)() 
    end) 
    --> x="4+5 = 9"
    local t = {name="lua", version="5.1"} 
    x = string.gsub("$name-$version.tar.gz", "%$(%w+)", t) 
    --> x="lua-5.1.tar.gz"



一篇关于lua模式匹配的文章：http://www.cnblogs.com/whiteyun/archive/2009/09/02/1558934.html