String源码(1.8)

1.String 存储的值就是一个char数组

1 /** The value is used for character storage. */ 2 private final char value[];

2.传入int作为参数，这个int是这个字对应的Unicode(16进制数)。每个最大65535 0xFFFF

public static final int MIN_CODE_POINT = 0x000000;

public static final int MAX_CODE_POINT = 0X10FFFF;

UTF-16中的基本单位是两个字节的码元,基本的码元范围是(0x0000-0xFFFF), UTF-16的字符映射范围是(U+0000,U+10FFFF),

当一个生僻字符需要使用0xFFFF以上的映射范围时,其需要使用两个码元(4Byte)进行表示. 其映射规则如下

第一个码元(前导代理)范围:0xD800 - 0xDBFF

第二个码元(后尾代理)范围:0xDC00 - 0xDFFF

有:(0xDBFF-0xD800+1)*(0xDFFF-0xDC00+1) === (0x10FFFF-0xFFFF)双射

所以(0xD800 - 0xDBFF)范围内的码元不能单独表示字符,其必须与后尾代理一起构成一个完整字符.

参考：https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

 1     public String(int[] codePoints, int offset, int count) {
 2         if (offset < 0) {
 3             throw new StringIndexOutOfBoundsException(offset);
 4         }
 5         if (count <= 0) {
 6             if (count < 0) {
 7                 throw new StringIndexOutOfBoundsException(count);
 8             }
 9             if (offset <= codePoints.length) {
10                 this.value = "".value;
11                 return;
12             }
13         }
14         // Note: offset or count might be near -1>>>1.
15         if (offset > codePoints.length - count) {
16             throw new StringIndexOutOfBoundsException(offset + count);
17         }
18 
19         final int end = offset + count;
20 
21         // Pass 1: Compute precise size of char[]
22         int n = count;
23         for (int i = offset; i < end; i++) {
24             int c = codePoints[i];
25             if (Character.isBmpCodePoint(c))
26                 continue;
27             else if (Character.isValidCodePoint(c))
28                 n++;
29             else throw new IllegalArgumentException(Integer.toString(c));
30         }
31 
32         // Pass 2: Allocate and fill in char[]
33         final char[] v = new char[n];
34 
35         for (int i = offset, j = 0; i < end; i++, j++) {
36             int c = codePoints[i];
37             if (Character.isBmpCodePoint(c))
38                 v[j] = (char)c;
39             else
40                 Character.toSurrogates(c, v, j++);
41         }
42 
43         this.value = v;
44     }

Character.isBmpCodePoint(c) 判断是不是只有一个码元的字符，
Character.isValidCodePoint(c) 判断在字符范围内。此时n++，这个int要用2个char表示。

Character.toSurrogates(c, v, j++) 将int分解成2个char

3.length()返回的是码元char的数量，而不是字的数量，有些字要占两个char

1     public int length() {
2         return value.length;
3     }

4.String.join 免去StringBuild自己拼还要去掉最后一个delimiter

1 public static String join(CharSequence delimiter, CharSequence... elements)

5.native 关键字调用别的语言的代码。

1 public native String intern();

深入解析String#intern

https://tech.meituan.com/in_depth_understanding_string_intern.html