StringUtils

对String功能的补充(null安全)包括查询判断截取去除分组替换等功能,又加入各种简明的关键词对功能进行加强,大部分功能都很简单,实现的种类也很齐全

关键词:

Not:boolean求反

Empty:空对象 Empty,空对象,拥有null的意义却没有null的bug

NULL:null Null,关键字,跟绝大多数bug沾亲带故

Start:字符串开头

End:字符串结尾

Whitespace:空格,包含全角,半角以及各种带有空格的操作码(/t。。)

IgnoreCase:忽视大小写

last:最后一位

Any:将String转为char[]后中的任意一个

But:对其他关键字取反

Only:仅仅

None:取反

Between:有left 有right就有可能有Between

代码版本：apache.commons.lang3.0

1.判断

isEmpty:判断是否为空

isBlank:是否空白

countMatches(CharSequence str, CharSequence sub):匹配次数

isAlpha:仅包括字符(Alpha为字母,不过本地话以后可以理解为字符)

isAlphaSpace:允许为半角空格

isAlphanumeric:允许为数字

isAlphanumericSpace,isAsciiPrintable,isNumeric,isNumericSpace,isWhitespace,isAllLowerCase,isAllUpperCase等

contains,equal 等

    public static boolean isEmpty(CharSequence cs) {//3.0之前为String
        return cs == null || cs.length() == 0;
    }//实现CharSequence 接口的有CharBuffer、String、StringBuffer、StringBuilder

    public static boolean isBlank(CharSequence cs) {
        int strLen;
        if (cs == null || (strLen = cs.length()) == 0) {
            return true;
        }
        for (int i = 0; i < strLen; i++) {
            if (Character.isWhitespace(cs.charAt(i)) == false) {
                return false;
            }
        }
        return true;
    }

... 
for (int i = 0; i < sz; i++) {
            if (Character.isLetter(cs.charAt(i)) == false) { //简单的理解为字母+方言(比如中文)
                return false;
            }
        }
...//isAlphaSpace
if (Character.isLetter(cs.charAt(i)) == false && cs.charAt(i) != ' ') {//半角空格
                return false;
            }
...//isNumeric
for (int i = 0; i < sz; i++) {
            if (Character.isDigit(cs.charAt(i)) == false) { //循环判断数字,故带+/-/.的全pass,另外大多判断都是如此
                    return false;                           //,故加入特殊方法
            }
        }

补充:半角空格与全角空格字节码不一样,trim直接与半角空格作比较(val[off + st] <= ' '),即与assic码比较

isWhitespace则做了更多的判断,总之,结论如下

1.Character.isWhitespace() 全角和半角空格都为空格，即返回true
2.Character.isSpaceChar() 全角和半角空格都为空格，即返回true
3.Character.isSpace() 半角空格为空格，即半角空格返回true,全角空格返回false,但此方法被废弃
4.trim()只截取掉半角的空格（全角空格与编码有关,估计当时只有assic上哪几种空格表示方式）

2特殊去除,通常仅代表两端

trim:非空判断后,去除前后空格(半角)

strip(String str, String stripChars):去除(剥夺)字符串某(两)端给定字符(默认为null)

stripStart / stripEnd

stripAccents:去"重音",关系不大

chomp:去除str尾部的换行符,属于截串的特殊方法

chop:去除最后一个字符+去除str尾部的换行符

补充:1967年公布的assic码(美国信息互换标准代码)是7位编码,包括95个图形字符+33个控制字符，主要用在电传打字机上,后来成为了计算机上最重要的标准,由于太偏向M帝,面对那些以英语为主要语言的国家,竟然不太适合,英镑,西欧的重音符都无法表示,好在计算机是以八位存储字符的,故有了第八位拓展,再然后就有了unicode,而unicode改变了一个字节代表1字符的关系,一本书,一段字符的大小,也不再仅仅由字符的多少决定了。。。

    public static String trim(String str) {
        return str == null ? null : str.trim();
    }//null安全

    public static String stripStart(String str, String stripChars) {
        int strLen;
        if (str == null || (strLen = str.length()) == 0) {
            return str;
        }
        int start = 0;
        if (stripChars == null) {
            while (start != strLen && Character.isWhitespace(str.charAt(start))) {//null就是去空格
                start++;
            }
        } else if (stripChars.length() == 0) {
            return str;
        } else {
              //可见去除的是连续存在stripChars中的字符
            while (start != strLen && stripChars.indexOf(str.charAt(start)) != INDEX_NOT_FOUND) {
                start++;
            }
        }
        return str.substring(start);
    }

...   //chomp
int lastIdx = str.length() - 1;
        char last = str.charAt(lastIdx);
        if (last == CharUtils.LF) {
            if (str.charAt(lastIdx - 1) == CharUtils.CR) { //

                lastIdx--;
            }
        } else if (last != CharUtils.CR) {
            lastIdx++;
        }
        return str.substring(0, lastIdx);

...//chop 
 if (strLen < 2) {
            return EMPTY;
        }
int lastIdx = strLen - 1;
        String ret = str.substring(0, lastIdx);
        char last = str.charAt(lastIdx);
        if (last == CharUtils.LF && ret.charAt(lastIdx - 1) == CharUtils.CR) {
            return ret.substring(0, lastIdx - 1);
        }
        return ret;

3.查询：

indexOf系列,包括indexOfIgnoreCase lastIndexOf lastIndexOfIgnoreCase lastIndexOfAny indexOfAnyBut
ordinalIndexOf :对indexOf的拓展,获取匹配字符串指定次数的索引

4.截取

substring(String str, int start, int end):截取字符串,参数为索引

包括substringBefore substringAfter substringBeforeLast substringAfterLast

left / right / mid(String str, int pos, int len):参数为len

//可以发现实现上对于不正确的数据直接抛异常,而调用则通过中间层(工具类)来排除错误
//,另外,beginIndex为索引,而    endIndex为ex索引(索引+1),或者是算头不算尾
   public static String substring(String str, int start) {
        if (str == null) {
            return null;
        }

        // handle negatives, which means last n characters
        if (start < 0) {
            start = str.length() + start; // remember start is negative
        }

        if (start < 0) {
            start = 0;
        }
        if (start > str.length()) {
            return EMPTY;
        }

        return str.substring(start);
    }

    public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
        throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
    }

//mid left right
str.substring(str.length() - len);//right
str.substring(pos, pos + len);//mid
str.substring(0, len);//left

   public static String substringBefore(String str, String separator) {
        if (isEmpty(str) || separator == null) {
            return str;
        }
        if (separator.length() == 0) {
            return EMPTY;
        }
        int pos = str.indexOf(separator);//获取第一个指定字符前所有字符
        if (pos == INDEX_NOT_FOUND) {
            return str;
        }
        return str.substring(0, pos);
    }

5.分组

split：匹配任意字符进行分组 (不同于String.split)

splitByWholeSeparator:匹配全部字符,可以连续匹配

splitPreserveAllTokens:同上,但不能连续匹配

splitByCharacterType:根据Character类型进行匹配

splitByCharacterTypeCamelCase:同上,支持驼峰

String.split:通过Pattern(正则表达式的编译表示形式)实现

栗子：

StringUtils.split("split","pi");//{s,l,t}
StringUtils.splitByWholeSeparator("1   2  "," ");//{1,2,}
StringUtils.splitPreserveAllTokens("1   2  "," ");//{1,,,2,,}
StringUtils.splitByCharacterType("splitByCharacterTypeCamelCase");//{split,B,y,C,haracter,T,ype,C,amel,C,ase}    
StringUtils.splitByCharacterTypeCamelCase("splitByCharacterTypeCamelCase");//{split,By,Character,Type,Camel,Case}

关键源码：

 ...//split
if (separatorChars.indexOf(str.charAt(i)) >= 0) {//字符串str.charAt(i),是否存在separatorChars中判断
 ...

...//splitByWholeSeparator  splitPreserveAllTokens
 while (end < len) {
            end = str.indexOf(separator, beg);//跟split相反,为全匹配
            if (end > -1) {
                if (end > beg) {
                    numberOfSubstrings += 1;
                    if (numberOfSubstrings == max) { //max为分组限制
                        end = len;
                        substrings.add(str.substring(beg));
                    } else {                        //通常添加
                        substrings.add(str.substring(beg, end));
                       beg = end + separatorLength;
                    }
                } else {  
                    if (preserveAllTokens) { //连续匹配处理
                       numberOfSubstrings += 1;
                        if (numberOfSubstrings == max) {
                            end = len;
                            substrings.add(str.substring(beg));
                        } else {
                            substrings.add(EMPTY);
                        }
                    }
                    beg = end + separatorLength;
                }
            } else {//最后一组
                substrings.add(str.substring(beg));
                end = len;
            }
        }

...//splitByCharacterType    splitByCharacterTypeCamelCase
         int type = Character.getType(c[pos]);//判断类型
           ...if (camelCase && type == Character.LOWERCASE_LETTER && currentType == Character.UPPERCASE_LETTER) {
                int newTokenStart = pos - 1;
                if (newTokenStart != tokenStart) {
                    list.add(new String(c, tokenStart, newTokenStart - tokenStart));
                    tokenStart = newTokenStart;
                }
            } else {
                list.add(new String(c, tokenStart, pos - tokenStart));
                tokenStart = pos;
            }
            currentType = type;

6. 合并

join:分组的反操作

System.out.println(StringUtils.join(new String[]{"j","o","i","n"},"-"));//j-o-i-n
System.out.println(StringUtils.join(Arrays.asList(null,"j",null,"o","i","n",null),"-"));//-j--o-i-n-

...  for (int i = startIndex; i < endIndex; i++) {
   if (i > startIndex) {
     buf.append(separator);
      }
   if (array[i] != null) { //null就是 ""
     buf.append(array[i]);
      }
    }
return buf.toString();

7.删除:先进行对应查询在截串

deleteWhitespace :去除各种空格

removeStart(removeStartIgnoreCase):当第一段字符为匹配字符时,去除

removeEnd(removeEndIgnoreCase):当最后一段字符为匹配字符时,去除

remove:删除匹配到的全部字符

System.out.println(StringUtils.deleteWhitespace("A　B    c a　b    C "));//ABcabC
System.out.println(StringUtils.removeStart("ABcacbC","a"));//ABcacbC
System.out.println(StringUtils.removeStartIgnoreCase("ABcacbC","a"));//BcacbC
System.out.println(StringUtils.removeEnd("ABcacbC","c"));//ABcacbC
System.out.println(StringUtils.removeEndIgnoreCase("ABcacbC","c"));//ABcacb
System.out.println(StringUtils.remove("ABcacbC","c"));//ABabC

8.替换

replaceOnce:替换第一个合格规定的

replace:替换所有合格规定的

replaceEach:参数为数组且一一对应,包含输入参数不等判断

replaceEachRepeatedly:replaceEach的迭代

replaceChars:replaceEach的字符版

overlay(String str, String overlay, int start, int end):所需参数不一样,可以变插入

 System.out.println(StringUtils.replaceOnce("abABab", "a","1"));//1bABab
 System.out.println(StringUtils.replace("abABab", "a","1"));//1bAB1b
 StringUtils.replaceEach("ABB", new String[]{"AB"},new String[]{"A"});//AB
 StringUtils.replaceEachRepeatedly("ABb", new String[]{"AB","b"},new String[]{"A","B"});//A
 System.out.println(StringUtils.replaceChars("abABab",'a','1'));//1bAB1b
 System.out.println(StringUtils.replaceChars("abABab ","ab","12"));//12AB12 
 System.out.println(StringUtils.overlay("abcdefg","一二三",1,2));//a一二三cdefg

...//replaceOnce 与 replace
while (end != INDEX_NOT_FOUND) {
   buf.append(text.substring(start, end)).append(replacement);
   start = end + replLength;
   if (--max == 0) {    //替换次数,replaceOnce为1,replace为-1
      break;
     end = text.indexOf(searchString, start);
        }
  buf.append(text.substring(start));
...

9.添加

repeat:循环

leftPad /rightPad/center :补齐最低字符数 (中文,英文与字符是一比一的关系,但宽度不一样,需要特殊处理)

System.out.println(StringUtils.rightPad("xx", 5, "阿"));//xx阿阿阿
System.out.println(StringUtils.rightPad("xx", 5, "a"));//xxaaa
System.out.println(StringUtils.center ("xx", 5, "a"));//axxaa

10.转换,多为去null后直接调用String

upperCase:大写

lowerCase:小写

capitalize:首字母大写

uncapitalize:首字母小写

swapCase:大小写互转

reverse:倒叙

reverseDelimited:先分组,在倒叙

abbreviate(String str, int offset, int maxWidth):缩写写法,超出长度用...补齐很有用(缺点同补齐)

abbreviateMiddle(String str, String middle, int length):中间替换

   public static String capitalize(String str) {
        int strLen;
        if (str == null || (strLen = str.length()) == 0) {
            return str;
        }
        return new StringBuilder(strLen)
            .append(Character.toTitleCase(str.charAt(0))) //只有首字母- -
            .append(str.substring(1))
            .toString();
    }
...//uncapitalize
 return new StringBuilder(strLen)
            .append(Character.toLowerCase(str.charAt(0))) //转换大小要谨慎，小写随意,当然,与我们无关
            .append(str.substring(1))
            .toString();

特殊

--Difference

difference(String str1, String str2):去除str2中与str1相同的地方

indexOfDifference(CharSequence cs1, CharSequence cs2):获取两处第一次不同的索引

indexOfDifference(CharSequence... css):数组版

getCommonPrefix(String... strs):获取相同字符-->indexOfDifference+截串

--其他

int getLevenshteinDistance(CharSequence s, CharSequence t):计算相似度,理论时间到了- -

LevenshTeinDistance是俄罗斯科学家Vladimir Levenshtein在1965年提出的根据编辑距离（Edit Distance）计算字符串相似度的一种算法。