分解

分解 是指将字节或字符序列分割为像单词这样的逻辑块的过程。Java 提供StreamTokenizer 类, 像下面这样操作:

 import java.io.*;

public class token1 {

   public static void main(String args[]) {

        if (args.length != 1) {

            System.err.println("missing filename");

            System.exit(1);

        }

        try {

            FileReader fr = new FileReader(args[0]);

            BufferedReader br = new BufferedReader(fr);

            StreamTokenizer st = new StreamTokenizer(br);

            st.resetSyntax();

            st.wordChars('a', 'z');

            int tok;

            while ((tok = st.nextToken()) != StreamTokenizer.TT_EOF) {

                 if (tok == StreamTokenizer.TT_WORD)

                     ;// st.sval has token

            }

            br.close();

        } catch (IOException e) {

            System.err.println(e);

        }

   }

}

这个例子分解小写单词 (字母a-z)。如果你自己实现同等地功能,它可能像这样:

 import java.io.*;

public class token2 {

   public static void main(String args[]) {

        if (args.length != 1) {

            System.err.println("missing filename");

            System.exit(1);

        }

        try {

            FileReader fr = new FileReader(args[0]);

            BufferedReader br = new BufferedReader(fr);

            int maxlen = 256;

            int currlen = 0;

            char wordbuf[] = new char[maxlen];

            int c;

            do {

                 c = br.read();

                 if (c >= 'a' && c <= 'z') {

                     if (currlen == maxlen) {

                         maxlen *= 1.5;

                         char xbuf[] = new char[maxlen];

                         System.arraycopy(wordbuf, 0, xbuf, 0, currlen);

                         wordbuf = xbuf;

                     }

                     wordbuf[currlen++] = (char) c;

                 } else if (currlen > 0) {

                     String s = new String(wordbuf, 0, currlen); // do something

                                                                     // with s

                     currlen = 0;

                 }

            } while (c != -1);

            br.close();

        } catch (IOException e) {

            System.err.println(e);

        }

   }

}

第二个程序比前一个运行快大约 20%,代价是写一些微妙的底层代码。

StreamTokenizer 是一种混合类,它从字符流(例如 BufferedReader)读取, 但是同时以字节的形式操作,将所有的字符当作双字节(大于 0xff) ,即使它们是字母字符。 

 

原文地址:https://www.cnblogs.com/borter/p/9434280.html