快学Scala(9)--文件和正则表达式

读取行：

读取文件中的所有行，可以有三种方法

1. 迭代器

val source = Source.fromFile("myFile", "UTF-8")
val lineIterator = source.getLines()
for(l <- lineIterator) println(l)
source.close()

2. 对迭代器toArray或toBuffer方法

val source = Source.fromFile("myFile", "UTF-8")
val lineIterator = source.getLines()
val lines = source.getLines().toArray
for(l <- lines) println(l)
source.close()

3. 将整个文件读取成一个字符串

val source = Source.fromFile("myFile", "UTF-8")
val lineIterator = source.getLines()
val content = source.mkString
println(content)
source.close()

读取字符：

也分三种方法：

1. 把source对象当做迭代器

val source = Source.fromFile("myFile", "utf-8")
for(c <- source) print(c)

2. 调用source对象的buffered方法

val iter = source.buffered
while(iter.hasNext) {
   if(iter.head == 'a') println(iter.head)
   iter.next()
}

　　该代码实现了当字符为‘a'的时候输出字符的功能

3. 如果文件不是很大，可以把它读取成一个字符串进行处理

val contents = source.mkString

读取词法单元和数字：

读空格隔开的词法单元：

val source = Source.fromFile("myFile", "utf-8")
val contents = source.mkString
val tokens = contents.split("\s+")

读取数字：

1. val numbers = for(w <- tokens) yield w.toDouble

2. val numbers = tokens.map(_.toDouble)

从URL或其他源读取

val source1 = Source.fromURL("http://www.cnblogs.com/PaulingZhou/", "UTF-8") //从url中读取
val source2 = Source.fromString("Hello world")　　//从给定的字符串读取--对调试很有用
val source3 = Source.stdin　　//从标准输入读取

读取二进制文件：

val filename = "myFile"
val file = new File(filename)
val in = new FileInputStream(file)
val bytes = new Array[Byte](file.length.toInt)
in.read(bytes)
in.close()

　　对于“myFile”内容为“123“的文件，读取的bytes结果为：

写入文本文件：

使用java.io.PrintWriter

val out = new PrintWriter("numbers.txt")
for(i <- 1 to 100) out.print(i)
out.close()

若使用printf写入文件，则需要转换成AnyRef类型

val quantity = 100
val price = 12.4212
out.printf("%6d %10.2f", quantity.asInstanceOf[AnyRef], price.asInstanceOf[AnyRef])

或使用String类的format方法

val quantity = 100
val price = 12.4212
out.print("%6d %10.2f".format(quantity, price))

　　注：Console类的printf没有这个问题，可以正常使用来输出消息到控制台

访问目录：

def subdir(dir: File): Iterator[File] = {
  val children = dir.listFiles.filter(_.isDirectory)
  children.toIterator ++ children.toIterator.flatMap(subdir _)
}

　　利用这个函数，可以像这样访问所有的子目录

for(d <- subdirs(dir))　　//处理d

如果使用的是java7，也可以使用java.nio.file.Files类的walkFileTree方法。该类用到了FileVisitor接口。在Scala中，我们通常喜欢用函数对象来指定工作内容，而不是接口。一下隐式转换让函数可以与借口相匹配：

implicit def makeFileVisitor(f: (Path) => Unit) = new SimpleFileVisitor[Path] {
  override def visitFile(p: Path, attrs: BasicFileAttributes): FileVisitResult = {
    f(p)
    FileVisitResult.CONTINUE
  }
}

　　这样一来，就可以使用如下方式调用来打印出所有的子目录了：

Files.walkFileTree(dir.toPath, (f: Path) => println(f))

序列化：

java：

import java.io.Serializable;

public class Person1 implements Serializable{

    private static final long serialVersionUID = 42L;

}

Scala：

@SerialVersionUID(42L) class Person extends Serializable {

}

对对象进行序列化和反序列化：

  def main(args: Array[String]): Unit = {
    val person = new Person("fred")
    val out = new ObjectOutputStream(new FileOutputStream("object.txt"))
    out.writeObject(person)
    out.close()
    val in = new ObjectInputStream(new FileInputStream("object.txt"))
    val saveFred = in.readObject().asInstanceOf[Person]
    println(saveFred.name)
  }

@SerialVersionUID(42L) class Person(val name: String) extends Serializable{

}

　　注：Scala集合类都是可序列化的，因此可以把它们用作可序列化类的成员

进程控制：

scala.sys.process包提供了用于与shell程序交互的工具。

1. !操作符返回的结果是被执行程序的返回值：程序成功执行的话就是0，否则就是显示错误的非0值

　　!!操作符表示输出将会以字符串形式返回

　　#|操作符表示将一个程序的输出以管道行驶作为输入传送到另一个程序

def main(args: Array[String]): Unit = {

　　val result1 = "ls -al ." #| "grep myFile" !;　　
　　val result2 = "ls -al ." #| "grep myFile" !!;
　　printf("result1: %s
", result1)
　　printf("result2: %s
", result2)

　　程序输出为：

2. 重定向：

　　要把输出重定向到文件，使用#>操作符

"ls -al ." #> new File("output.txt") !;

　　要追加到文件末尾而不是从头覆盖的话，使用#>>操作符

"ls -al ." #>> new File("output.txt") !;

　　要把某个文件的内容作为输入，使用#<

"grep myFile" #< new File("output.txt") !;

　　从URL重定向输入：

"grep Scala" #< new URL("http://horstmann.com/index.html") !;

正则表达式：

scala.util.matching.Regex类可以使用正则表达式来分析字符串，要构建Regex对象，用String类的r方法即可

val numPatterm = "[0-9]+".r

　　若正则表达式包含反斜杠或引号的话，那么最好使用”原始“字符串语法 """..."""

val wsnumwsPattern = """s+[0-9]+s+""".r
//和"\s+[0-9]+\s+".r相比要更易读一些

　　findAllIn方法返回遍历所有匹配项的迭代器，可以在for循环中使用它，也可以将迭代器转成数组

for(matchString <- numPattern.findAllIn("99 bottles, 98 bottles"))
    处理matchString

val matches = numPattern.findAllIn("99 bottles, 98 bottles").toArray
    //Array(99, 98)

　　注：如果要照到字符串中的首个匹配项，可使用findFirstIn方法；要检查是否某个字符串的开始部分能匹配，可用findPrefixOf方法

替换匹配项

numPattern.replaceFirstIn("99 bottles, 98 bottles", "XX")　　//"XX bottles, 98 bottles"
numPattern.replaceAllIn("99 bottles, 98 bottles", "XX")　　　//"XX bottles, XX bottles"

正则表达式组：

  def main(args: Array[String]): Unit = {
    val numitemPattern = "([0-9]+) ([a-z]+)".r
    for(numitemPattern(num, item) <- numitemPattern.findAllIn("99 bottles, 98 bottls")){
      println(num)
      println(item)
    }
  }

　　输出结果为：