java中BitSet简单使用

前言

BitSet实现了一个按需增长的位向量,每一位都是一个boolean值,可以对每一位进行设置或清除,常用场景就是判断一个数据在一个大数据集中是否存在。

使用

有1千万个随机数,随机数的范围在1到1亿之间。现在要求写出一种算法,将1到1亿之间没有在随机数中的数求出来?

import java.util.BitSet;
import java.util.Random;

public class Client2 {

  //一亿
  private static final int HUNDRED_MILLION = 100_000_000;
  //一千万
  private static final int TEN_MILLION = 10_000_000;

  public static void main(String[] args) {
    Random random = new Random();
    BitSet bitSet = new BitSet(HUNDRED_MILLION);
    for (int i = 0; i < TEN_MILLION; i++) {
      //将指定索引设置为true
      bitSet.set(random.nextInt(HUNDRED_MILLION));
    }
    for (int i = 0; i < HUNDRED_MILLION; i++) {
      //获取指定索引为false表示不存在
      if (!bitSet.get(i)) {
        System.out.println(i);
      }
    }
  }
}

原理

/**
 * This class implements a vector of bits that grows as needed. Each
 * component of the bit set has a {@code boolean} value. The
 * bits of a {@code BitSet} are indexed by nonnegative integers.
 * Individual indexed bits can be examined, set, or cleared. One
 * {@code BitSet} may be used to modify the contents of another
 * {@code BitSet} through logical AND, logical inclusive OR, and
 * logical exclusive OR operations.
 *
 * <p>By default, all bits in the set initially have the value
 * {@code false}.
 *
 * <p>Every bit set has a current size, which is the number of bits
 * of space currently in use by the bit set. Note that the size is
 * related to the implementation of a bit set, so it may change with
 * implementation. The length of a bit set relates to logical length
 * of a bit set and is defined independently of implementation.
 *
 * <p>Unless otherwise noted, passing a null parameter to any of the
 * methods in a {@code BitSet} will result in a
 * {@code NullPointerException}.
 *
 * <p>A {@code BitSet} is not safe for multithreaded use without
 * external synchronization.
 *
 * @author  Arthur van Hoff
 * @author  Michael McCloskey
 * @author  Martin Buchholz
 * @since   1.0
 */
public class BitSet implements Cloneable, java.io.Serializable {
    /**
     * 存储数据的数组
     */
    private long[] words;

    /**
     * 将位索引转换成数组索引
     */
    private static int wordIndex(int bitIndex) {
        return bitIndex >> ADDRESS_BITS_PER_WORD;
    }
     /**
     * 构造器
     */
    public BitSet() {
        initWords(BITS_PER_WORD);
        sizeIsSticky = false;
    }

    /**
     * 根据位长度构造
     */
    public BitSet(int nbits) {
        // nbits can't be negative; size 0 is OK
        if (nbits < 0)
            throw new NegativeArraySizeException("nbits < 0: " + nbits);

        initWords(nbits);
        sizeIsSticky = true;
    }

    /**
     * 根据位长度创建数组 可以看做求分页的总页数 页面大小64 64就是1页 65就是2页 
     */
    private void initWords(int nbits) {
        words = new long[wordIndex(nbits-1) + 1];
    }    
}

BitSet内部是使用long数组来存储所有数据的,相比HashMap还是很节省内存的。

可以看到ArrayList的removeIf()方法实现也是实现了一个简单的BitSet。


EnumSet的两种实现RegularEnumSet和JumboEnumSet也是通过BitSet的思想实现的。

总结

BitSet的原理主要就是将对一个int值或字符串的操作转换成位运算,相应的占用空间就少了。

参考

Java BitSet使用场景和示例

原文地址:https://www.cnblogs.com/strongmore/p/14191205.html