Java转义emoji等特殊符号

写在前面

网上找了很多转emoji等方法,大多有两种方法

  1. 更改数据库编码格式为utf8mb4
  2. 过滤字符串中的emoji

都不是很优雅

  1. 更改数据库编码,势必影响其他数据库
  2. 过滤emoj效率比较低

处理Emoji方式

这里推荐使用org.apache.commons.lang3.StringEscapeUtils工具类,简单等两行代码实现特殊符号和emoji表情的转义存储,和读取反转;

转义存储

  1. StringEscapeUtils.escapeXXX(content)

它有几种转码方式,可以根据个人格式进行选择:

  1. public static final String escapeCsv(final String input);
  2. public static final String escapeEcmaScript(final String input);
  3. public static final String escapeHtml3(final String input);
  4. public static final String escapeHtml4(final String input);
  5. public static final String escapeJava(final String input);
  6. public static final String escapeJson(final String input);
  7. public static final String escapeXml(final String input);
  8. public static String escapeXml10(final String input);
  9. public static String escapeXml11(final String input)

读取反转义

读取后,根据个人格式进行反转义,即可还原emoji值,供前端展示;

  1. public static final String unescapeCsv(final String input) ;
  2. public static final String unescapeEcmaScript(final String input);
  3. public static final String unescapeHtml3(final String input);
  4. public static final String unescapeHtml4(final String input);
  5. public static final String unescapeJava(final String input);
  6. public static final String unescapeJson(final String input);
  7. public static final String unescapeXml(final String input);

附加一段手打的复杂代码:

package utils;

import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.lang.StringUtils;

public class EmojiUtils {
    /**
     * emoji表情替换
     *
     * @param source 原字符串
     * 
     * @param slipStr emoji表情替换成的字符串
     * 
     * @return 过滤后的字符串
     */
    public static String filterEmoji(String source, String slipStr) {
        if (StringUtils.isNotBlank(source)) {
            return source.replaceAll("[\ud800\udc00-\udbff\udfff\ud800-\udfff]", slipStr);
        } else {
            return source;
        }
    }

    /**
     * @Description 将字符串中的emoji表情转换成可以在utf-8字符集数据库中保存的格式(表情占4个字节,需要utf8mb4字符集)
     * @param str
     *            待转换字符串
     * @return 转换后字符串
     * @throws UnsupportedEncodingException
     *             exception
     */
    public static String emojiConvert(String str) throws UnsupportedEncodingException {
        String patternString = "([\x{10000}-\x{10ffff}ud800-udfff])";

        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(str);
        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            try {
                matcher.appendReplacement(sb, "[[" + URLEncoder.encode(matcher.group(1), "UTF-8") + "]]");
            } catch (UnsupportedEncodingException e) {
                throw e;
            }
        }
        matcher.appendTail(sb);
        return sb.toString();
    }

    /**
     * @Description 还原utf8数据库中保存的含转换后emoji表情的字符串
     * @param str
     *            转换后的字符串
     * @return 转换前的字符串
     * @throws UnsupportedEncodingException
     *             exception
     */
    public static String emojiRecovery2(String str) throws UnsupportedEncodingException {
        String patternString = "\[\[(.*?)\]\]";

        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(str);

        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            try {
                matcher.appendReplacement(sb, URLDecoder.decode(matcher.group(1), "UTF-8"));
            } catch (UnsupportedEncodingException e) {
                throw e;
            }
        }
        matcher.appendTail(sb);
        return sb.toString();
    }
}
原文地址:https://www.cnblogs.com/linnuo/p/8777454.html