小记 Flink的waterMark的起始位置如何计算(源码分析)

 
源码位置:
 .timeWindow(Time.milliseconds(1000L)) 
timeWindow()
def timeWindow(size: Time): WindowedStream[T, K, TimeWindow] = {
  new WindowedStream(javaStream.timeWindow(size))
}

  

javaStream.timeWindow(size)
public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
   if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
      return window(TumblingProcessingTimeWindows.of(size));
   } else {
      return window(TumblingEventTimeWindows.of(size));
   }
}

  

window(TumblingEventTimeWindows.of(size))
public Collection<TimeWindow> assignWindows(Object element, long timestamp, WindowAssignerContext context) {
   if (timestamp > Long.MIN_VALUE) {
      if (staggerOffset == null) {
         staggerOffset = windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);
      }
      // Long.MIN_VALUE is currently assigned when no timestamp is present
      long start = TimeWindow.getWindowStartWithOffset(timestamp, (globalOffset + staggerOffset) % size, size);
      return Collections.singletonList(new TimeWindow(start, start + size));
   } else {
      throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no timestamp marker). " +
            "Is the time characteristic set to 'ProcessingTime', or did you forget to call " +
            "'DataStream.assignTimestampsAndWatermarks(...)'?");
   }
}

  

TimeWindow.getWindowsStartWithOffset(timestamp,(globalOffset + staggerOffset) % size, size)
public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
   return timestamp - (timestamp - offset + windowSize) % windowSize;
}

  

一直追到这个位置也就是WaterMark的计算公式
timestamp - (timestamp - offset +windowSize)% windowSize;
其中timestamp是我们每条数据元素本身自带的eventtime时间戳  windowSize是窗口时间也就是我们设置的。offset默认是0,主要是修改时区的,本次分析默认为0 
因此公式可以简化为:timestamp -(timestamp + windowSize) % windowSize 
一个数对自己取余数结果恒等于0 ,故再次简化为: timestamp - (timestamp % windowSize)
也就是时间戳减去时间戳对窗口时间的余数 => 也就是timestamp对windowSize的整数倍。
举个栗子: 假设元素时间戳为1547718199000  窗口大小为15000  单位均为毫秒
起始位置= 1547718199000 - (1547718199000 - 0 + 15000) % 15000 
              = 154771899000 - 4000 
              = 154771895000
所以第一个时间窗口为:[1547718195000 - 1547718210000)  前闭后开 , 后面的窗口以此类推 
作者:于二黑
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
原文地址:https://www.cnblogs.com/yzqyxq/p/15786484.html