hadoop拾遗（二）---- 文件模式

　　在单个操作中处理一批文件，这是一个常见的要求。举例来说，处理日志的MapReduce作业可能需要分析一个月内包含在大量目录中的日志文件。在一个表达式中使用通配符来匹配多个文件是比较方便的，无需列举第个文件和目录来指定输入，该操作称为"通配"(globbing)。Hadoop 为执行通配提供了两个FileSystem方法：

public FileStatus[] globStatus(Path pathPattern) throws IOException

public FileStatus[] globStatus(Paht pathPattern , PathFileter filter) throws IOException

globStatus()方法返回与路径相匹配的所有文件的 FileStatus 对象数据，并按Hadoop支持的通配符与Unix bash相同

　　PathFilter 对象

　　通配符模式并不总能够精确地描述我们想要访问的文件集。比如，使用通配格式排除一个特定的文件就不太可能。FileSystem中的listStatus()和globStatus()方法提供了可先的PathFilter对象，使我们能够通过编程方式控制通配符：

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;

public class RegexExcludePathFilter implements PathFilter {

	private final String regex;
	
	public RegexExcludePathFilter(String regex)
	{
		this.regex = regex;
	}
	
	public boolean accept(Path path)
	{
		return !path.toString().matches(regex);
	}
}

//按如下方式获得筛选后的路径

fs.globStatus(new Path("/2007/*/*") , new RegexExcludeFilter("^.*/2007/12/31$"));