Java中使用ThreadLocal减少线程同步的规模

多线程访问共享资源

通常在多线程访问共享资源的场景会存在线程安全，临界区，竞争条件等问题，比如：

@RestController
public class StatController {

  static Integer c = 0;

  @RequestMapping("/stat")
  public Integer stat() {
    return c;
  }

  @RequestMapping("/add")
  public Integer add() throws InterruptedException {
    // 这里是临界区，sleep模拟IO等待
    Thread.sleep(100);
    c++;
    return 1;
  }
}

// ab -n 1000 -c 100 http://192.168.2.1:8080/add
// 耗时1.398s
// curl http://192.168.2.1:8080/stat
// 864

当多个线程访问共享资源时会产生竞态条件，导致最终结果不一致，最终stat统计值为864而不是1000。

使用同步锁机制保证线程安全

可以使用synchronized关键词对临界区代码片段加锁，变并发为排队，解决数据不一致问题

@RestController
public class StatController {

  static Integer c = 0;
  
  // 为临界区添加同步锁
  synchronized static void  __add() {
    Thread.sleep(100);
    c++;
  }

  @RequestMapping("/stat")
  public Integer stat() {
    return c;
  }

  @RequestMapping("/add")
  public Integer add() {
    __add();
    return 1;
  }
}

// ab -n 1000 -c 100 http://192.168.2.1:8080/add
// 耗时103.280s
// curl http://192.168.2.1:8080/stat
// 1000

加锁简单粗暴，但是性能很差。由于每次请求都需要排队，吞吐量必然下降巨大。

所以，非常不推荐使用锁机制。

隔离线程资源，减少对“共享资源”的同步写入

同步锁会极大降低性能，尤其是共享资源的访问IO还很高的时候，即同步IO很慢。

所以要尽量降低同步写入共享资源的频率。

最好每个线程只操作自己的资源，对于共享资源尽量少的去写入。

不能完全避免同步，因为有时要收集各线程的计算结果。

可以使用一个HashMap让各线程只操作自己的资源，然后获取统计时再进行计算汇总。

@RestController
public class StatController {

  static HashMap<Thread, Integer> map = new HashMap<>();
  
  // 初始化仍然需要同步写入共享map
  // 因为新增key-value可能扩容导致HashMap被覆盖
  synchronized static void __putIfAbsent() {
    map.putIfAbsent(Thread.currentThread(), 0);
  }

  @RequestMapping("/stat")
  public Integer stat() {
    return map.values().stream().reduce(Integer::sum).get();
  }

  @RequestMapping("/add")
  public Integer add() throws InterruptedException {
    Thread.sleep(100);
    __putIfAbsent();
    Integer v = map.get(Thread.currentThread());
    v++;
    map.put(Thread.currentThread(), v);
    return 1;
  }
}

// ab -n 1000 -c 100 http://192.168.2.1:8080/add
// 耗时1.365s
// curl http://192.168.2.1:8080/stat
// 1000

上述做法使用HashMap去隔离线程资源，仅对可能存在的Map扩容做同步处理。

使用HashMap将不同线程的资源隔离，让每个线程只操作自己的数据是减少线程同步的一种策略。

所谓线程不安全本质上是因为“共享资源”存在被覆盖的风险，即前者的操作结果被后者的操作结果覆盖了。

使用ThreadLocal再实现一遍

上述直接使用HashMap存在很多问题，比如性能，资源回收等。

Java中使用ThreadLocal实现上述隔离逻辑，ThreadLocal内部实现了一个ThreadLocalMap来对线程资源做隔离。

@RestController
public class StatController {
  
  // 使用HashSet收集计算结果
  // 由于Integer是不可变类型，定义一个Val做可变引用
  static HashSet<Val<Integer>> set = new HashSet<>();
  
  // 添加Set存在扩容导致线程安全的风险
  synchronized static void addSet(Val<Integer> v) {
    set.add(v);
  }
  
  static ThreadLocal<Val<Integer>> c = new ThreadLocal<Val<Integer>>(){
    @Override
    protected Val<Integer> initialValue() {
      // 初始化时分别添加Set收集器和ThreadLocalMap
      Val<Integer> v = new Val<>();
      v.set(0);
      addSet(v);
      return v;
    }
  };

  @RequestMapping("/stat")
  public Integer stat() {
    return set.stream().map(Val::get).reduce(Integer::sum).get();
  }

  @RequestMapping("/add")
  public void add() throws InterruptedException {
    Thread.sleep(100);
    // ThreadLocal会自动给出当前线程的值
    Val<Integer> v = c.get();
    v.set(v.get() + 1);
  }
}

// ab -n 1000 -c 100 http://192.168.2.1:8080/add
// 耗时1.406s
// curl http://192.168.2.1:8080/stat
// 1000

ThreadLocal将资源和线程进行绑定，做到线程间资源隔离。

上述案例比较特殊，还存在一个资源收集的过程，但很多时候不需要收集，也就没有同步动作。

如果多个线程都需要同一类型的资源，虽然可以手动在线程内定义局部变量，但使用threadLocal只需要定义一次，然后系统自动绑定和分配，也是很方便的。

为什么代码可以优化？

代码之所以可以优化，是因为代码中存在没必要的操作。

上述案例类似于分布式计算的Map和Reduce过程，没有必要将每一次运算结果都进行汇总。

只要等待所有线程计算完毕后最终汇总一次就可以了。

所以使用什么数据结构取决于具体的问题，我们需要理解数据结构的功能然后灵活使用。