Python使用chardet包自动检测编码

chardet:charset detection
一旦自动检测出编码,就可以解码了。

八种文件打开方式

  • w:一旦打开文件,文件内容就清空了
  • r:只读方式打开
  • a:追加方式打开
  • r+:先读后写
    以上四种打开方式加上b,表示二进制方式。

str.decoding(encoding,error='strice')

解码时遇到错误有三种处理方式

  • strict:默认,抛出异常
  • replace:替换
  • ignore:不管

utf.py

import chardet
import os
import sys


def utf(path, recursive=False):
    print(path)
    if os.path.isfile(path):
        with open(path, 'rb+') as f:
            content = f.read()
            encoding = chardet.detect(content)['encoding']
            if encoding != 'utf-8':
                s = content.decode(encoding, errors='ignore')
                f.write(s.encode('utf8', errors='ignore'))
    else:
        for i in os.listdir(path):
            now_path = os.path.join(path, i)
            if os.path.isdir(now_path) and recursive:
                utf(now_path, recursive)
            elif os.path.splitext(i)[1] == '.txt':
                utf(now_path)


usage = """
        python utf haha.txt #更改单文件
        python utf haha #更改文件夹下的全部文本文件(.txt)
        python utf haha recursive #递归更改文件夹下的全部文本文件
        """
if __name__ == '__main__':
    # sys.argv = ['main', r'C:UsersweidiaoDesktop电子书', 'recursive']
    if len(sys.argv) == 1:
        print(usage)
        exit()
    if len(sys.argv) > 3:
        print(usage)
        print('too many argument')
        exit()
    path = sys.argv[1]
    if not os.path.exists(sys.argv[1]):
        print(usage)
        print('no this file or folder')
        exit()
    recursive = (len(sys.argv) == 3 and sys.argv[2] == 'recursive')
    utf(path, recursive)


原文地址:https://www.cnblogs.com/weiyinfu/p/7271978.html