python 字符集转换-灰常慢

代码

def toUni (text):
    str = text
    try:
        charstyle = chardet.detect(text)
        # print 'confidence: ', charstyle['confidence'] # 猜测精度
        if ( charstyle['encoding'] == 'GB2312' ):
            str = text.decode( charstyle['encoding'], 'replace')
        elif ( charstyle['encoding'] == 'gbk' ):
            str = text.decode( charstyle['encoding'], 'replace' )
        elif ( charstyle['encoding'] == 'utf-8' ):
            str = text.decode( charstyle['encoding'], 'replace' )
        else:
            str = text.decode( charstyle['encoding'], 'replace' )
    except Exception, e:
        print ('[changeToUni.except] %s' % str(e) )
        str = text
    return str

另外说一句,这个是非常耗费时间的,一般网页要1-3秒钟。。。非常不划算。

原文地址:https://www.cnblogs.com/mmix2009/p/3229294.html