Hbase取数据问题Bug

问题描述:

hbase表中有数据,而使用hbase的Client取数据的条数小于hbase实际的条数。并且在客户端是没有报错信息。

Bug1:

使用的是协处理器进行取数据的,说下协处理器的作用,在客户端对所取的数据进行处理后,再返回给客户端。这样可以减少数据的传输,提高查询速度。

客户端没有报错,找了下服务器端,报错信息如下:

2019-03-01 14:46:04,924 ERROR [B.defaultRpcServer.handler=59,queue=5,port=16020] observer.AggrRegionObserver: tracker Coprocessor Error
java.lang.RuntimeException: tracker coprocess memory usage goes beyond cap, (40 + 4194304) * 50 > 209715200. Abord coprocessor.
        at com.tracker.coprocessor.observer.aggregate.handler.TopNHandler.checkMemoryUsage(TopNHandler.java:103)
        at com.tracker.coprocessor.observer.aggregate.AggrRegionScanner.buildAggrCache(AggrRegionScanner.java:61)
        at com.tracker.coprocessor.observer.aggregate.AggrRegionScanner.<init>(AggrRegionScanner.java:37)
        at com.tracker.coprocessor.observer.AggrRegionObserver.doPostScannerObserver(AggrRegionObserver.java:70)
        at com.tracker.coprocessor.observer.AggrRegionObserver.postScannerOpen(AggrRegionObserver.java:37)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1334)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1712)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1329)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:748)

重点在黄色部分,占用的内存超过了预期值。

当我减少一次性所取数据,发现问题又来了,即Bug2。

Bug2:

跟踪代码,发现从Scan获取的量就有问题,只能取一条数据。后来发现注释掉scan.setBatch()方法时,可以正常获取数据。

看下Scan的源码:

  /**
   * Set the maximum number of values to return for each call to next().
   * Callers should be aware that invoking this method with any value
   * is equivalent to calling {@link #setAllowPartialResults(boolean)}
   * with a value of {@code true}; partial results may be returned if
   * this method is called. Use {@link #setMaxResultSize(long)}} to
   * limit the size of a Scan's Results instead.
   *
   * @param batch the maximum number of values
   */
  public Scan setBatch(int batch) {
    if (this.hasFilter() && this.filter.hasFilterRow()) {
      throw new IncompatibleFilterException(
        "Cannot set batch on a scan using a filter" +
        " that returns true for filter.hasFilterRow");
    }
    this.batch = batch;
    return this;
  }

代码的意思为:行过滤器和批处理不能同时使用,有冲突。

有人疑问为什么没有报错,是因为在Hbase的Client上又封装了下,相关代码为:

    protected Scan constructScanByRowRange(String startRowKey, String endRowKey, QueryExtInfo queryExtInfo, boolean isAggr, Class clsType){
        //构造Scan
        Scan scan = constructScan(queryExtInfo, isAggr, clsType);
        scan.setStartRow(Bytes.toBytes(startRowKey));
        scan.setStopRow(Bytes.toBytes(endRowKey));
        //构造filter
        if(queryExtInfo != null && queryExtInfo.isFilterSet()) scan.setFilter(queryExtInfo.getFilterList());
        
        if(queryExtInfo != null && queryExtInfo.getScanCacheSize() != null) scan.setCaching(queryExtInfo.getScanCacheSize());
        else scan.setCaching(scanCachingSize);

        if (!(scan.hasFilter() && scan.getFilter().hasFilterRow())) {
              scan.setBatch(batchRead);
        }
        return scan;
    }

所有的过滤行为都是在QueryExtInfo 类中实现,但是在使用行过滤器时并没有给改变scan的内部变量,所以scan.hasFilter() && scan.getFilter().hasFilterRow()为flase。

所以bug在这里被忽略了。

原文地址:https://www.cnblogs.com/parent-absent-son/p/10457956.html