post 中文数据到elasticsearch restful接口报json_parse_exception 问题

我们的客户端程序直接调用es 的restful接口, 通过post json数据去查询, 但post数据有中文的时候,有些中文会报异常,有些中文不会

{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x5c at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@58cf272c; line: 1, column: 238]"}],"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x5c at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@58cf272c; line: 1, column: 238]"},"status":500}

而通过es head插件去post 同样的json数据,却运行正常,  初步判断写数据的时候有问题, 上代码

    

URL url = new URL(esURL);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setDoOutput(true);
            connection.setDoInput(true);
            connection.setRequestMethod("POST");
            connection.setUseCaches(false);
            //connection.setConnectTimeout(30000);// 超时时间设置为30秒
            connection.setInstanceFollowRedirects(true);
            connection.setRequestProperty("Charsert", "UTF-8");
            connection.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
            connection.setRequestProperty("Accept-Language", "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3");
            
            connection.connect();
            
            // POST请求
            DataOutputStream out = new DataOutputStream(connection.getOutputStream());
            out.writeBytes(query);

问题就出在wirteBytes()方法里,我们看JDK源代码

public final void writeBytes(String s) throws IOException {
        int len = s.length();
        for (int i = 0 ; i < len ; i++) {
            out.write((byte)s.charAt(i));
        }
        incCount(len);
    }

我们知道UTF8编码里一个中文用3个字节来存储,而这里是直接把一个中文强制转一个byte, 这样肯定会有问题的

修改代码成

out.write(query.getBytes("UTF-8"));

问题解决


           

原文地址:https://www.cnblogs.com/devilwind/p/7488243.html