SqlException with message "Caught java.io.CharConversionException." and ERRORCODE=-4220

Technote (troubleshooting)

Problem(Abstract)

When an application uses the IBM Data Server Driver for JDBC and SQLJ (also known as the JCC driver) and is connected to a database with code set UTF-8 (code page 1208), it throws an SqlException with message including "Caught java.io.CharConversionException" and ERRORCODE=-4220 if the data in a character column that it queries contains a sequence of bytes that is not a valid UTF-8 string.

Symptom

An exception is thrown similar to this:

com.ibm.db2.jcc.am.SqlException: [jcc][t4][1065][12306][XXX.XXX.XXX] Caught java.io.CharConversionException. See attached Throwable for details. ERRORCODE=-4220, SQLSTATE=null [...] Caused by: java.nio.charset.MalformedInputException: Input length = XXX [...] Caused by: sun.io.MalformedInputException at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:XXX) [...]

Cause

The JCC driver throws the exception when data in a character column that it queries is not a valid string in the database code page.
The invalid data could have been input to the database in the following ways:

By running an SQL statement which writes byte values to the database. For example: INSERT INTO tab1 VALUES (X'C3')

By running the IMPORT or LOAD commands for files which contain character data that is not in the code page of the client machine, so the appropriate code page conversion was not done. To make sure that the the appropriate code page conversion is done when running IMPORT or LOAD, specify the code page of the input file by including the "codepage=x" file type modifier (specifying the code page in place of x).

Diagnosing the problem

You can use the HEX function to find out what are the byte values in a character column.

For example, to find out the byte values in column COL1 in table TAB1, run: SELECT HEX(col1) FROM tab1

Resolving the problem

Update the invalid data with a valid UTF-8 character string.

Alternatively, for JCC driver versions that support it you can set the JCC configuration property db2.jcc.charsetDecoderEncoder=3 so that instead of throwing an exception the JCC driver returns the Unicode REPLACEMENT CHARACTER (U+FFFD) in place of a sequence of bytes that is not a valid UTF-8 string. The JCC configuration property db2.jcc.charsetDecoderEncoder is supported in versions of the JCC driver that come with DB2 LUW 9.5 Fix Pack 8 and later (APAR IC74896), DB2 LUW 9.7 Fix Pack 5 and later (APAR IC74895), and all DB2 LUW releases from DB2 10.1 onwards.
For example, suppose you have a Java program MyApp.java that executes an SQL query of a database column that contains a sequence of bytes that is not a valid string. When you run: java MyApp it throws an exception. But when you run: java -Ddb2.jcc.charsetDecoderEncoder=3 MyApp it returns the string with any invalid sequence of bytes replaced by the Unicode REPLACEMENT CHARACTER.
The Unicode REPLACEMENT CHARACTER often appears like this (a diamond with a question mark inside it) :

Related information

APAR IC78495 APAR IC74896 Making the setting for IBM Data Studio Making the setting for IBM Content Collector UDF to identify invalid data in a UTF-8 character colum How invalid data can get into a UTF-8 character column LOAD command IMPORT command HEX function

Community questions and discussion

By adding a comment, you accept our Terms of Use. Your comments entered on this IBM Support site do not represent the views or opinions of IBM. IBM, in its sole discretion, reserves the right to remove any comments from this site. IBM is not responsible for, and does not validate or confirm, the correctness or accuracy of any comments you post. IBM does not endorse any of your comments. All IBM comments are provided "AS IS" and are not warranted by IBM in any way.