solr dataimport 数据导入源码分析（十三）

本文接下来分析EntityProcessor相关类，我们可以称之为实体处理器，针对不同的数据源有不同的实体处理器，屏蔽了不同数据源的差异

本文只介绍针对数据库数据源的实体处理器，其他实体处理器类似

EntityProcessor类为抽象类，定义了获取数据源的Map类型数据的方法（针对添加修改删除的数据）

/**
 * <p>
 * An instance of entity processor serves an entity. It is reused throughout the
 * import process.
 * </p>
 * <p/>
 * <p>
 * Implementations of this abstract class must provide a public no-args constructor.
 * </p>
 * <p/>
 * <p>
 * Refer to <a
 * href="http://wiki.apache.org/solr/DataImportHandler">http://wiki.apache.org/solr/DataImportHandler</a>
 * for more details.
 * </p>
 * <p/>
 * <b>This API is experimental and may change in the future.</b>
 *
 * @version $Id: EntityProcessor.java 824359 2009-10-12 14:31:54Z ehatcher $
 * @since solr 1.3
 */
public abstract class EntityProcessor {

  /**
   * This method is called when it starts processing an entity. When it comes
   * back to the entity it is called again. So it can reset anything at that point.
   * For a rootmost entity this is called only once for an ingestion. For sub-entities , this
   * is called multiple once for each row from its parent entity
   *
   * @param context The current context
   */
  public abstract void init(Context context);

  /**
   * This method helps streaming the data for each row . The implementation
   * would fetch as many rows as needed and gives one 'row' at a time. Only this
   * method is used during a full import
   *
   * @return A 'row'.  The 'key' for the map is the column name and the 'value'
   *         is the value of that column. If there are no more rows to be
   *         returned, return 'null'
   */
  public abstract Map<String, Object> nextRow();

  /**
   * This is used for delta-import. It gives the pks of the changed rows in this
   * entity
   *
   * @return the pk vs value of all changed rows
   */
  public abstract Map<String, Object> nextModifiedRowKey();

  /**
   * This is used during delta-import. It gives the primary keys of the rows
   * that are deleted from this entity. If this entity is the root entity, solr
   * document is deleted. If this is a sub-entity, the Solr document is
   * considered as 'changed' and will be recreated
   *
   * @return the pk vs value of all changed rows
   */
  public abstract Map<String, Object> nextDeletedRowKey();

  /**
   * This is used during delta-import. This gives the primary keys and their
   * values of all the rows changed in a parent entity due to changes in this
   * entity.
   *
   * @return the pk vs value of all changed rows in the parent entity
   */
  public abstract Map<String, Object> nextModifiedParentRowKey();

  /**
   * Invoked for each parent-row after the last row for this entity is processed. If this is the root-most
   * entity, it will be called only once in the import, at the very end.
   * 
   */
  public abstract void destroy();

  /**
   * Invoked after the transformers are invoked. EntityProcessors can add, remove or modify values
   * added by Transformers in this method.
   *
   * @param r The transformed row
   * @since solr 1.4
   */
  public void postTransform(Map<String, Object> r) {
  }

  /**
   * Invoked when the Entity processor is destroyed towards the end of import.
   *
   * @since solr 1.4
   */
  public void close() {
    //no-op
  }
}

继承类EntityProcessorBase是所有具体实体处理器的基类，定义了公用方法，其中最重要的是Map<String, Object> getNext()，从数据迭代器Iterator<Map<String, Object>> rowIterator获取Map类型数据记录（其中DIHCacheSupport cacheSupport对象用于缓存）

protected Map<String, Object> getNext() {
    if(cacheSupport==null) {
      try {
        if (rowIterator == null)
          return null;
        if (rowIterator.hasNext())
          return rowIterator.next();
        query = null;
        rowIterator = null;
        return null;
      } catch (Exception e) {
        SolrException.log(log, "getNext() failed for query '" + query + "'", e);
        query = null;
        rowIterator = null;
        wrapAndThrow(DataImportHandlerException.WARN, e);
        return null;
      }
    } else  {
      return cacheSupport.getCacheData(context, query, rowIterator);
    }      
  }

SqlEntityProcessor类为数据库数据源的实体处理器

/**
 * <p>
 * An {@link EntityProcessor} instance which provides support for reading from
 * databases. It is used in conjunction with {@link JdbcDataSource}. This is the default
 * {@link EntityProcessor} if none is specified explicitly in data-config.xml
 * </p>
 * <p/>
 * <p>
 * Refer to <a
 * href="http://wiki.apache.org/solr/DataImportHandler">http://wiki.apache.org/solr/DataImportHandler</a>
 * for more details.
 * </p>
 * <p/>
 * <b>This API is experimental and may change in the future.</b>
 *
 * @version $Id: SqlEntityProcessor.java 1065312 2011-01-30 16:08:25Z rmuir $
 * @since solr 1.3
 */
public class SqlEntityProcessor extends EntityProcessorBase {
  private static final Logger LOG = LoggerFactory.getLogger(SqlEntityProcessor.class);
  //数据源
  protected DataSource<Iterator<Map<String, Object>>> dataSource;
  //初始化数据源
  @Override
  @SuppressWarnings("unchecked")
  public void init(Context context) {
    super.init(context);
    dataSource = context.getDataSource();
  }
  //初始化数据迭代器（根据查询语句从数据源获取）
  protected void initQuery(String q) {
    try {
      DataImporter.QUERY_COUNT.get().incrementAndGet();
      rowIterator = dataSource.getData(q);
      this.query = q;
    } catch (DataImportHandlerException e) {
      throw e;
    } catch (Exception e) {
      LOG.error( "The query failed '" + q + "'", e);
      throw new DataImportHandlerException(DataImportHandlerException.SEVERE, e);
    }
  }

  @Override
  public Map<String, Object> nextRow() {    
    if (rowIterator == null) {
      String q = getQuery();
      initQuery(context.replaceTokens(q));
    }
    return getNext();
  }

  @Override
  public Map<String, Object> nextModifiedRowKey() {
    if (rowIterator == null) {
      String deltaQuery = context.getEntityAttribute(DELTA_QUERY);
      if (deltaQuery == null)
        return null;
      initQuery(context.replaceTokens(deltaQuery));
    }
    return getNext();
  }

  @Override
  public Map<String, Object> nextDeletedRowKey() {
    if (rowIterator == null) {
      String deletedPkQuery = context.getEntityAttribute(DEL_PK_QUERY);
      if (deletedPkQuery == null)
        return null;
      initQuery(context.replaceTokens(deletedPkQuery));
    }
    return getNext();
  }

  @Override
  public Map<String, Object> nextModifiedParentRowKey() {
    if (rowIterator == null) {
      String parentDeltaQuery = context.getEntityAttribute(PARENT_DELTA_QUERY);
      if (parentDeltaQuery == null)
        return null;
      LOG.info("Running parentDeltaQuery for Entity: "
              + context.getEntityAttribute("name"));
      initQuery(context.replaceTokens(parentDeltaQuery));
    }
    return getNext();
  }
  
  public String getQuery() {
    String queryString = context.getEntityAttribute(QUERY);
    if (Context.FULL_DUMP.equals(context.currentProcess())) {
      return queryString;
    }
    if (Context.DELTA_DUMP.equals(context.currentProcess())) {
      String deltaImportQuery = context.getEntityAttribute(DELTA_IMPORT_QUERY);
      if(deltaImportQuery != null) return deltaImportQuery;
    }
    LOG.warn("'deltaImportQuery' attribute is not specified for entity : "+ entityName);
    return getDeltaImportQuery(queryString);
  }

  public String getDeltaImportQuery(String queryString) {    
    StringBuilder sb = new StringBuilder(queryString);
    if (SELECT_WHERE_PATTERN.matcher(queryString).find()) {
      sb.append(" and ");
    } else {
      sb.append(" where ");
    }
    boolean first = true;
    String[] primaryKeys = context.getEntityAttribute("pk").split(",");
    for (String primaryKey : primaryKeys) {
      if (!first) {
        sb.append(" and ");
      }
      first = false;
      Object val = context.resolve("dataimporter.delta." + primaryKey);
      if (val == null) {
        Matcher m = DOT_PATTERN.matcher(primaryKey);
        if (m.find()) {
          val = context.resolve("dataimporter.delta." + m.group(1));
        }
      }
      sb.append(primaryKey).append(" = ");
      if (val instanceof Number) {
        sb.append(val.toString());
      } else {
        sb.append("'").append(val.toString()).append("'");
      }
    }
    return sb.toString();
  }

  private static Pattern SELECT_WHERE_PATTERN = Pattern.compile(
          "^\\s*(select\\b.*?\\b)(where).*", Pattern.CASE_INSENSITIVE);

  public static final String QUERY = "query";

  public static final String DELTA_QUERY = "deltaQuery";

  public static final String DELTA_IMPORT_QUERY = "deltaImportQuery";

  public static final String PARENT_DELTA_QUERY = "parentDeltaQuery";

  public static final String DEL_PK_QUERY = "deletedPkQuery";

  public static final Pattern DOT_PATTERN = Pattern.compile(".*?\\.(.*)$");
}

我们接下来分析EntityProcessorWrapper类，该类继承自抽象类EntityProcessor，用于装饰具体的实体处理器（装饰模式）

其重要成员如下

 //被装饰的实体处理器 
 EntityProcessor delegate;
  private DocBuilder docBuilder;

  String onError;
  Context context;
  protected VariableResolverImpl resolver;
  String entityName;

  protected List<Transformer> transformers;

  protected List<Map<String, Object>> rowcache;

在它的构造方法里面，初始化被装饰的成员对象

public EntityProcessorWrapper(EntityProcessor delegate, DocBuilder docBuilder) {
    this.delegate = delegate;
    this.docBuilder = docBuilder;
  }

初始化方法里面调用被装饰对象的初始化方法（获取数据源）

@Override
  public void init(Context context) {
    rowcache = null;
    this.context = context;
    resolver = (VariableResolverImpl) context.getVariableResolver();
    //context has to be set correctly . keep the copy of the old one so that it can be restored in destroy
    if (entityName == null) {
      onError = resolver.replaceTokens(context.getEntityAttribute(ON_ERROR));
      if (onError == null) {
          onError = ABORT;
      }
      entityName = context.getEntityAttribute(DataConfig.NAME);
    }
    delegate.init(context);

  }

其他相关方法均为调用被装饰的具体实体处理器的相应方法，另外添加了数据转换等功能，本文不再具体分析

---------------------------------------------------------------------------

本系列solr dataimport 数据导入源码分析系本人原创

转载请注明出处博客园刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/05/04/3059397.html