(转)汉字转拼音HanziToPinyin

本文转载于:http://blog.csdn.net/zhangphil/article/details/47164665

Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。

对此问题的修复(解决方案)

我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,

1 if ( locale[i].equals(Locale.CHINA) ||  locale[i].equals(chinaAddition) ){
2 3 }

下面是我改进后的getInstance()方法全部代码:

 1 public static HanziToPinyin getInstance() {
 2         synchronized (HanziToPinyin.class) {
 3             if (sInstance != null) {
 4                 return sInstance;
 5             }
 6             // Check if zh_CN collation data is available
 7             final Locale locale[] = Collator.getAvailableLocales();
 8 
 9             // 增加的代码,增强。
10             final Locale chinaAddition = new Locale("zh");
11 
12             for (int i = 0; i < locale.length; i++) {
13                 if (locale[i].equals(Locale.CHINA)
14                         || locale[i].equals(chinaAddition)) {
15                     // Do self validation just once.
16                     if (DEBUG) {
17                         Log.d(TAG, "Self validation. Result: "
18                                 + doSelfValidation());
19                     }
20                     sInstance = new HanziToPinyin(true);
21                     return sInstance;
22                 }
23             }
24             Log.w(TAG,
25                     "There is no Chinese collator, HanziToPinyin is disabled");
26             sInstance = new HanziToPinyin(false);
27             return sInstance;
28         }
29     }

经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):

  1 /*
  2  * Copyright (C) 2011 The Android Open Source Project
  3  *
  4  * Licensed under the Apache License, Version 2.0 (the "License");
  5  * you may not use this file except in compliance with the License.
  6  * You may obtain a copy of the License at
  7  *
  8  *      http://www.apache.org/licenses/LICENSE-2.0
  9  *
 10  * Unless required by applicable law or agreed to in writing, software
 11  * distributed under the License is distributed on an "AS IS" BASIS,
 12  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13  * See the License for the specific language governing permissions and
 14  * limitations under the License.
 15  */
 16 
 17 package zhangphil.hanyupinyin;
 18 
 19 import android.text.TextUtils;
 20 import android.util.Log;
 21 
 22 import java.text.Collator;
 23 import java.util.ArrayList;
 24 import java.util.Locale;
 25 
 26 /**
 27  * An object to convert Chinese character to its corresponding pinyin string.
 28  * For characters with multiple possible pinyin string, only one is selected
 29  * according to collator. Polyphone is not supported in this implementation.
 30  * This class is implemented to achieve the best runtime performance and minimum
 31  * runtime resources with tolerable sacrifice of accuracy. This implementation
 32  * highly depends on zh_CN ICU collation data and must be always synchronized
 33  * with ICU.
 34  *
 35  * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
 36  */
 37 public class HanziToPinyin {
 38     private static final String TAG = "HanziToPinyin";
 39 
 40     // Turn on this flag when we want to check internal data structure.
 41     private static final boolean DEBUG = false;
 42 
 43     /**
 44      * Unihans array.
 45      *
 46      * Each unihans is the first one within same pinyin when collator is zh_CN.
 47      */
 48     public static final char[] UNIHANS = { 'u963f', 'u54ce', 'u5b89',
 49             'u80ae', 'u51f9', 'u516b', 'u6300', 'u6273', 'u90a6',
 50             'u52f9', 'u9642', 'u5954', 'u4f3b', 'u5c44', 'u8fb9',
 51             'u706c', 'u618b', 'u6c43', 'u51ab', 'u7676', 'u5cec',
 52             'u5693', 'u5072', 'u53c2', 'u4ed3', 'u64a1', 'u518a',
 53             'u5d7e', 'u66fd', 'u66fe', 'u5c64', 'u53c9', 'u8286',
 54             'u8fbf', 'u4f25', 'u6284', 'u8f66', 'u62bb', 'u6c88',
 55             'u6c89', 'u9637', 'u5403', 'u5145', 'u62bd', 'u51fa',
 56             'u6b3b', 'u63e3', 'u5ddb', 'u5205', 'u5439', 'u65fe',
 57             'u9034', 'u5472', 'u5306', 'u51d1', 'u7c97', 'u6c46',
 58             'u5d14', 'u90a8', 'u6413', 'u5491', 'u5446', 'u4e39',
 59             'u5f53', 'u5200', 'u561a', 'u6265', 'u706f', 'u6c10',
 60             'u55f2', 'u7538', 'u5201', 'u7239', 'u4e01', 'u4e1f',
 61             'u4e1c', 'u543a', 'u53be', 'u8011', 'u8968', 'u5428',
 62             'u591a', 'u59b8', 'u8bf6', 'u5940', 'u97a5', 'u513f',
 63             'u53d1', 'u5e06', 'u531a', 'u98de', 'u5206', 'u4e30',
 64             'u8985', 'u4ecf', 'u7d11', 'u4f15', 'u65ee', 'u4f85',
 65             'u7518', 'u5188', 'u768b', 'u6208', 'u7ed9', 'u6839',
 66             'u522f', 'u5de5', 'u52fe', 'u4f30', 'u74dc', 'u4e56',
 67             'u5173', 'u5149', 'u5f52', 'u4e28', 'u5459', 'u54c8',
 68             'u548d', 'u4f44', 'u592f', 'u8320', 'u8bc3', 'u9ed2',
 69             'u62eb', 'u4ea8', 'u5677', 'u53ff', 'u9f41', 'u4e6f',
 70             'u82b1', 'u6000', 'u72bf', 'u5ddf', 'u7070', 'u660f',
 71             'u5419', 'u4e0c', 'u52a0', 'u620b', 'u6c5f', 'u827d',
 72             'u9636', 'u5dfe', 'u5755', 'u5182', 'u4e29', 'u51e5',
 73             'u59e2', 'u5658', 'u519b', 'u5494', 'u5f00', 'u520a',
 74             'u5ffc', 'u5c3b', 'u533c', 'u808e', 'u52a5', 'u7a7a',
 75             'u62a0', 'u625d', 'u5938', 'u84af', 'u5bbd', 'u5321',
 76             'u4e8f', 'u5764', 'u6269', 'u5783', 'u6765', 'u5170',
 77             'u5577', 'u635e', 'u808b', 'u52d2', 'u5d1a', 'u5215',
 78             'u4fe9', 'u5941', 'u826f', 'u64a9', 'u5217', 'u62ce',
 79             'u5222', 'u6e9c', 'u56d6', 'u9f99', 'u779c', 'u565c',
 80             'u5a08', 'u7567', 'u62a1', 'u7f57', 'u5463', 'u5988',
 81             'u57cb', 'u5ada', 'u7264', 'u732b', 'u4e48', 'u5445',
 82             'u95e8', 'u753f', 'u54aa', 'u5b80', 'u55b5', 'u4e5c',
 83             'u6c11', 'u540d', 'u8c2c', 'u6478', 'u54de', 'u6bea',
 84             'u55ef', 'u62cf', 'u8149', 'u56e1', 'u56d4', 'u5b6c',
 85             'u7592', 'u5a1e', 'u6041', 'u80fd', 'u59ae', 'u62c8',
 86             'u5b22', 'u9e1f', 'u634f', 'u56dc', 'u5b81', 'u599e',
 87             'u519c', 'u7fba', 'u5974', 'u597b', 'u759f', 'u9ec1',
 88             'u90cd', 'u5594', 'u8bb4', 'u5991', 'u62cd', 'u7705',
 89             'u4e53', 'u629b', 'u5478', 'u55b7', 'u5309', 'u4e15',
 90             'u56e8', 'u527d', 'u6c15', 'u59d8', 'u4e52', 'u948b',
 91             'u5256', 'u4ec6', 'u4e03', 'u6390', 'u5343', 'u545b',
 92             'u6084', 'u767f', 'u4eb2', 'u72c5', 'u828e', 'u4e18',
 93             'u533a', 'u5cd1', 'u7f3a', 'u590b', 'u5465', 'u7a63',
 94             'u5a06', 'u60f9', 'u4eba', 'u6254', 'u65e5', 'u8338',
 95             'u53b9', 'u909a', 'u633c', 'u5827', 'u5a51', 'u77a4',
 96             'u637c', 'u4ee8', 'u6be2', 'u4e09', 'u6852', 'u63bb',
 97             'u95aa', 'u68ee', 'u50e7', 'u6740', 'u7b5b', 'u5c71',
 98             'u4f24', 'u5f30', 'u5962', 'u7533', 'u8398', 'u6552',
 99             'u5347', 'u5c38', 'u53ce', 'u4e66', 'u5237', 'u8870',
100             'u95e9', 'u53cc', 'u8c01', 'u542e', 'u8bf4', 'u53b6',
101             'u5fea', 'u635c', 'u82cf', 'u72fb', 'u590a', 'u5b59',
102             'u5506', 'u4ed6', 'u56fc', 'u574d', 'u6c64', 'u5932',
103             'u5fd1', 'u71a5', 'u5254', 'u5929', 'u65eb', 'u5e16',
104             'u5385', 'u56f2', 'u5077', 'u51f8', 'u6e4d', 'u63a8',
105             'u541e', 'u4e47', 'u7a75', 'u6b6a', 'u5f2f', 'u5c23',
106             'u5371', 'u6637', 'u7fc1', 'u631d', 'u4e4c', 'u5915',
107             'u8672', 'u4eda', 'u4e61', 'u7071', 'u4e9b', 'u5fc3',
108             'u661f', 'u51f6', 'u4f11', 'u5401', 'u5405', 'u524a',
109             'u5743', 'u4e2b', 'u6079', 'u592e', 'u5e7a', 'u503b',
110             'u4e00', 'u56d9', 'u5e94', 'u54df', 'u4f63', 'u4f18',
111             'u625c', 'u56e6', 'u66f0', 'u6655', 'u7b60', 'u7b7c',
112             'u5e00', 'u707d', 'u5142', 'u5328', 'u50ae', 'u5219',
113             'u8d3c', 'u600e', 'u5897', 'u624e', 'u635a', 'u6cbe',
114             'u5f20', 'u957f', 'u9577', 'u4f4b', 'u8707', 'u8d1e',
115             'u4e89', 'u4e4b', 'u5cd9', 'u5ea2', 'u4e2d', 'u5dde',
116             'u6731', 'u6293', 'u62fd', 'u4e13', 'u5986', 'u96b9',
117             'u5b92', 'u5353', 'u4e72', 'u5b97', 'u90b9', 'u79df',
118             'u94bb', 'u539c', 'u5c0a', 'u6628', 'u5159', 'u9fc3',
119             'u9fc4', };
120 
121     /**
122      * Pinyin array.
123      *
124      * Each pinyin is corresponding to unihans of same offset in the unihans
125      * array.
126      */
127     public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128             { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129             { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130             { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131             { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132             { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133             { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134             { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135             { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136             { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137             { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138             { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139             { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140             { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141             { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142             { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143             { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144             { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145             { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146             { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147             { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148             { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149             { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150             { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151             { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152             { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153             { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154             { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155             { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156             { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157             { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158             { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159             { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160             { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161             { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162             { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163             { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164             { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165             { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166             { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167             { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168             { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169             { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170             { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171             { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172             { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173             { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174             { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175             { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176             { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177             { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178             { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179             { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180             { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181             { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182             { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183             { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184             { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185             { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186             { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187             { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188             { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189             { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190             { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191             { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192             { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193             { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194             { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195             { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196             { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197             { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198             { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199             { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200             { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201             { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202             { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203             { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204             { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205             { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206             { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207             { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208             { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209             { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210             { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211             { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212             { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213             { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214             { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215             { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216             { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217             { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218             { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219             { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220             { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221             { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222             { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223             { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224             { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225             { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226             { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227             { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228             { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229             { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230             { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231             { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232             { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233             { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234             { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235             { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236             { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237             { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238             { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239             { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240             { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241             { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242             { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243             { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244             { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245             { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246             { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247             { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248             { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249             { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250             { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251             { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252             { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253             { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254             { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255             { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256             { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257             { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258             { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259             { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260             { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261             { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262             { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263             { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264             { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265             { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266             { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267             { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268             { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269             { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270             { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271             { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272             { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273             { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274             { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275             { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276             { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277             { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278             { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279             { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280             { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281             { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282             { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283             { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284             { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285             { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286             { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287             { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288             { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289             { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290             { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291             { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292             { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293             { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294             { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295             { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296             { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297             { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298             { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299             { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300             { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301             { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302             { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303             { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304             { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305             { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306             { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307             { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308             { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309             { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310             { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311             { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312             { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313             { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314             { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315             { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316             { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317             { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318             { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319             { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320             { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321             { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322             { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323             { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324             { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325             { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326             { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327             { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328             { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329             { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330             { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331             { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332             { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333             { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334             { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335             { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336             { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337             { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338             { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339             { 0, 0, 0, 0, 0, 0 }, };
340 
341     /**
342      * First and last Chinese character with known Pinyin according to zh
343      * collation
344      */
345     private static final String FIRST_PINYIN_UNIHAN = "u963F";
346     private static final String LAST_PINYIN_UNIHAN = "u9FFF";
347 
348     private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349 
350     private static HanziToPinyin sInstance;
351     private final boolean mHasChinaCollator;
352 
353     public static class Token {
354         /**
355          * Separator between target string for each source char
356          */
357         public static final String SEPARATOR = " ";
358 
359         public static final int LATIN = 1;
360         public static final int PINYIN = 2;
361         public static final int UNKNOWN = 3;
362 
363         public Token() {
364         }
365 
366         public Token(int type, String source, String target) {
367             this.type = type;
368             this.source = source;
369             this.target = target;
370         }
371 
372         /**
373          * Type of this token, ASCII, PINYIN or UNKNOWN.
374          */
375         public int type;
376         /**
377          * Original string before translation.
378          */
379         public String source;
380         /**
381          * Translated string of source. For Han, target is corresponding Pinyin.
382          * Otherwise target is original string in source.
383          */
384         public String target;
385     }
386 
387     protected HanziToPinyin(boolean hasChinaCollator) {
388         mHasChinaCollator = hasChinaCollator;
389     }
390 
391     public static HanziToPinyin getInstance() {
392         synchronized (HanziToPinyin.class) {
393             if (sInstance != null) {
394                 return sInstance;
395             }
396             // Check if zh_CN collation data is available
397             final Locale locale[] = Collator.getAvailableLocales();
398 
399             // 增加的代码,增强。
400             final Locale chinaAddition = new Locale("zh");
401 
402             for (int i = 0; i < locale.length; i++) {
403                 if (locale[i].equals(Locale.CHINA)
404                         || locale[i].equals(chinaAddition)) {
405                     // Do self validation just once.
406                     if (DEBUG) {
407                         Log.d(TAG, "Self validation. Result: "
408                                 + doSelfValidation());
409                     }
410                     sInstance = new HanziToPinyin(true);
411                     return sInstance;
412                 }
413             }
414             Log.w(TAG,
415                     "There is no Chinese collator, HanziToPinyin is disabled");
416             sInstance = new HanziToPinyin(false);
417             return sInstance;
418         }
419     }
420 
421     /**
422      * Validate if our internal table has some wrong value.
423      *
424      * @return true when the table looks correct.
425      */
426     private static boolean doSelfValidation() {
427         char lastChar = UNIHANS[0];
428         String lastString = Character.toString(lastChar);
429         for (char c : UNIHANS) {
430             if (lastChar == c) {
431                 continue;
432             }
433             final String curString = Character.toString(c);
434             int cmp = COLLATOR.compare(lastString, curString);
435             if (cmp >= 0) {
436                 Log.e(TAG, "Internal error in Unihan table. "
437                         + "The last string "" + lastString
438                         + "" is greater than current string "" + curString
439                         + "".");
440                 return false;
441             }
442             lastString = curString;
443         }
444         return true;
445     }
446 
447     private Token getToken(char character) {
448         Token token = new Token();
449         final String letter = Character.toString(character);
450         token.source = letter;
451         int offset = -1;
452         int cmp;
453         if (character < 256) {
454             token.type = Token.LATIN;
455             token.target = letter;
456             return token;
457         } else {
458             cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459             if (cmp < 0) {
460                 token.type = Token.UNKNOWN;
461                 token.target = letter;
462                 return token;
463             } else if (cmp == 0) {
464                 token.type = Token.PINYIN;
465                 offset = 0;
466             } else {
467                 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468                 if (cmp > 0) {
469                     token.type = Token.UNKNOWN;
470                     token.target = letter;
471                     return token;
472                 } else if (cmp == 0) {
473                     token.type = Token.PINYIN;
474                     offset = UNIHANS.length - 1;
475                 }
476             }
477         }
478 
479         token.type = Token.PINYIN;
480         if (offset < 0) {
481             int begin = 0;
482             int end = UNIHANS.length - 1;
483             while (begin <= end) {
484                 offset = (begin + end) / 2;
485                 final String unihan = Character.toString(UNIHANS[offset]);
486                 cmp = COLLATOR.compare(letter, unihan);
487                 if (cmp == 0) {
488                     break;
489                 } else if (cmp > 0) {
490                     begin = offset + 1;
491                 } else {
492                     end = offset - 1;
493                 }
494             }
495         }
496         if (cmp < 0) {
497             offset--;
498         }
499         StringBuilder pinyin = new StringBuilder();
500         for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501             pinyin.append((char) PINYINS[offset][j]);
502         }
503         token.target = pinyin.toString();
504         if (TextUtils.isEmpty(token.target)) {
505             token.type = Token.UNKNOWN;
506             token.target = token.source;
507         }
508         return token;
509     }
510 
511     /**
512      * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513      * characters without space will be put into a Token, One Hanzi character
514      * which has pinyin will be treated as a Token. If these is no China
515      * collator, the empty token array is returned.
516      */
517     public ArrayList<Token> get(final String input) {
518         ArrayList<Token> tokens = new ArrayList<Token>();
519         if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520             // return empty tokens.
521             return tokens;
522         }
523         final int inputLength = input.length();
524         final StringBuilder sb = new StringBuilder();
525         int tokenType = Token.LATIN;
526         // Go through the input, create a new token when
527         // a. Token type changed
528         // b. Get the Pinyin of current charater.
529         // c. current character is space.
530         for (int i = 0; i < inputLength; i++) {
531             final char character = input.charAt(i);
532             if (character == ' ') {
533                 if (sb.length() > 0) {
534                     addToken(sb, tokens, tokenType);
535                 }
536             } else if (character < 256) {
537                 if (tokenType != Token.LATIN && sb.length() > 0) {
538                     addToken(sb, tokens, tokenType);
539                 }
540                 tokenType = Token.LATIN;
541                 sb.append(character);
542             } else {
543                 Token t = getToken(character);
544                 if (t.type == Token.PINYIN) {
545                     if (sb.length() > 0) {
546                         addToken(sb, tokens, tokenType);
547                     }
548                     tokens.add(t);
549                     tokenType = Token.PINYIN;
550                 } else {
551                     if (tokenType != t.type && sb.length() > 0) {
552                         addToken(sb, tokens, tokenType);
553                     }
554                     tokenType = t.type;
555                     sb.append(character);
556                 }
557             }
558         }
559         if (sb.length() > 0) {
560             addToken(sb, tokens, tokenType);
561         }
562         return tokens;
563     }
564 
565     private void addToken(final StringBuilder sb,
566             final ArrayList<Token> tokens, final int tokenType) {
567         String str = sb.toString();
568         tokens.add(new Token(tokenType, str, str));
569         sb.setLength(0);
570     }
571 }
HanziToPinyin.java

写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:

 1 package zhangphil.hanyupinyin;
 2 
 3 import java.util.ArrayList;
 4 
 5 import zhangphil.hanyupinyin.HanziToPinyin.Token;
 6 import android.app.Activity;
 7 import android.os.Bundle;
 8 
 9 public class MainActivity extends Activity {
10 
11     @Override
12     protected void onCreate(Bundle savedInstanceState) {
13         super.onCreate(savedInstanceState);
14 
15         String s = "安卓";
16         System.out.println("汉字转拼音输出: " + getPinYin(s));
17     }
18 
19     // 输入汉字返回拼音的通用方法函数。
20     public static String getPinYin(String hanzi) {
21         ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
22         StringBuilder sb = new StringBuilder();
23         if (tokens != null && tokens.size() > 0) {
24             for (Token token : tokens) {
25                 if (Token.PINYIN == token.type) {
26                     sb.append(token.target);
27                 } else {
28                     sb.append(token.source);
29                 }
30             }
31         }
32 
33         return sb.toString().toUpperCase();
34     }
35 }

结果输出如图:

原文地址:https://www.cnblogs.com/zzw1994/p/4997784.html