(C#) Encoding.

Encoding.GetEncoding(936)).Contains(@"这是简体中文")

在.NET的世界里,string永远是unicode,所以通过读取TXT文件的每行,然后来判断其内容时,需要进行解码。

foreach (string line in File.ReadAllLines(“D:\\test.txt"))
{
  Console.writeline (" {0}" + line);
}

具体编码参考MSDN. Encoding类

http://msdn.microsoft.com/zh-cn/library/system.text.encoding(v=vs.100).aspx

Windows Locale Codes Sorted by Codepage 

As defined by Microsoft, a locale is either a language or a language in combination with a country. SeeMicrosoft definitions of locale.

CLICK one of the Column Titles to sort the table by that item.

Language (Locale)LCID
Decimal
LCID
Hexade.
CodepageCountry
code
Telugu 1098 044a 0 IND
Gujarati 1095 0447 0 IND
Punjabi 1094 0446 0 IND
Sanskrit 1103 044f 0 IND
Konkani 1111 0457 0 IND
Syriac 1114 045a 0 SYR
Kannada 1099 044b 0 IND
Marathi 1102 044e 0 IND
Divehi 1125 0465 0 MDV
Armenian 1067 042b 0 ARM
Hindi 1081 0439 0 IND
Georgian 1079 0437 0 GEO
Tamil 1097 0449 0 IND
Thai 1054 041e 874 THA
Japanese 1041 0411 932 JPN
Chinese (PRC) 2052 0804 936 CHN
Chinese (Singapore) 4100 1004 936 SGP
Korean 1042 0412 949 KOR
Chinese (Macau S.A.R.) 5124 1404 950 MCO
Chinese (Hong Kong S.A.R.) 3076 0c04 950 HKG
Chinese (Taiwan) 1028 0404 950 TWN
Romanian 1048 0418 1250 ROM
Slovenian 1060 0424 1250 SVN
Hungarian 1038 040e 1250 HUN
Slovak 1051 041b 1250 SVK
Polish 1045 0415 1250 POL
Albanian 1052 041c 1250 ALB
Serbian (Latin) 2074 081a 1250 SPB
Croatian 1050 041a 1250 HRV
Czech 1029 0405 1250 CZE
Mongolian (Cyrillic) 1104 0450 1251 MNG
FYRO Macedonian 1071 042f 1251 MKD
Uzbek (Cyrillic) 2115 0843 1251 UZB
Ukrainian 1058 0422 1251 UKR
Azeri (Cyrillic) 2092 082c 1251 AZE
Tatar 1092 0444 1251 RUS
Kazakh 1087 043f 1251 KAZ
Belarusian 1059 0423 1251 BLR
Kyrgyz (Cyrillic) 1088 0440 1251 KGZ
Bulgarian 1026 0402 1251 BGR
Serbian (Cyrillic) 3098 0c1a 1251 SPB
Russian 1049 0419 1251 RUS
English (Jamaica) 8201 2009 1252 JAM
French (Canada) 3084 0c0c 1252 CAN
French (France) 1036 040c 1252 FRA
French (Luxembourg) 5132 140c 1252 LUX
English (New Zealand) 5129 1409 1252 NZL
English (Ireland) 6153 1809 1252 IRL
Dutch (Netherlands) 1043 0413 1252 NLD
English (Caribbean) 9225 2409 1252 CAR
French (Switzerland) 4108 100c 1252 CHE
English (Canada) 4105 1009 1252 CAN
Galician 1110 0456 1252 ESP
English (Belize) 10249 2809 1252 BLZ
German (Austria) 3079 0c07 1252 AUT
French (Monaco) 6156 180c 1252 MCO
English (Zimbabwe) 12297 3009 1252 ZWE
Basque 1069 042d 1252 ESP
Dutch (Belgium) 2067 0813 1252 BEL
French (Belgium) 2060 080c 1252 BEL
Finnish 1035 040b 1252 FIN
Faroese 1080 0438 1252 FRO
German (Germany) 1031 0407 1252 DEU
English (Australia) 3081 0c09 1252 AUS
English (United States) 1033 0409 1252 USA
English (United Kingdom) 2057 0809 1252 GBR
Catalan 1027 0403 1252 ESP
English (Trinidad) 11273 2c09 1252 TTO
English (South Africa) 7177 1c09 1252 ZAF
Danish 1030 0406 1252 DNK
English (Philippines) 13321 3409 1252 PHL
Spanish (Paraguay) 15370 3c0a 1252 PRY
Spanish (Colombia) 9226 240a 1252 COL
Spanish (Costa Rica) 5130 140a 1252 CRI
Spanish (Dominican Republic) 7178 1c0a 1252 DOM
Spanish (Ecuador) 12298 300a 1252 ECU
Spanish (El Salvador) 17418 440a 1252 SLV
Spanish (Guatemala) 4106 100a 1252 GTM
Spanish (Honduras) 18442 480a 1252 HND
Spanish (International Sort) 3082 0c0a 1252 ESP
Spanish (Chile) 13322 340a 1252 CHL
Spanish (Nicaragua) 19466 4c0a 1252 NIC
Spanish (Mexico) 2058 080a 1252 MEX
Spanish (Peru) 10250 280a 1252 PER
Spanish (Puerto Rico) 20490 500a 1252 PRI
Spanish (Traditional Sort) 1034 040a 1252 ESP
Spanish (Uruguay) 14346 380a 1252 URY
Spanish (Venezuela) 8202 200a 1252 VEN
Swahili 1089 0441 1252 KEN
Swedish 1053 041d 1252 SWE
Swedish (Finland) 2077 081d 1252 FIN
German (Liechtenstein) 5127 1407 1252 LIE
Afrikaans 1078 0436 1252 ZAF
Spanish (Panama) 6154 180a 1252 PAN
German (Luxembourg) 4103 1007 1252 LUX
Spanish (Bolivia) 16394 400a 1252 BOL
German (Switzerland) 2055 0807 1252 CHE
Icelandic 1039 040f 1252 ISL
Indonesian 1057 0421 1252 IDN
Italian (Italy) 1040 0410 1252 ITA
Italian (Switzerland) 2064 0810 1252 CHE
Norwegian (Nynorsk) 2068 0814 1252 NOR
Spanish (Argentina) 11274 2c0a 1252 ARG
Portuguese (Brazil) 1046 0416 1252 BRA
Norwegian (Bokmal) 1044 0414 1252 NOR
Malay (Malaysia) 1086 043e 1252 MYS
Malay (Brunei Darussalam) 2110 083e 1252 BRN
Portuguese (Portugal) 2070 0816 1252 PRT
Greek 1032 0408 1253 GRC
Uzbek (Latin) 1091 0443 1254 UZB
Azeri (Latin) 1068 042c 1254 AZE
Turkish 1055 041f 1254 TUR
Hebrew 1037 040d 1255 ISR
Arabic (Algeria) 5121 1401 1256 DZA
Arabic (Bahrain) 15361 3c01 1256 BHR
Arabic (Yemen) 9217 2401 1256 YEM
Arabic (Egypt) 3073 0c01 1256 EGY
Arabic (Iraq) 2049 0801 1256 IRQ
Arabic (Jordan) 11265 2c01 1256 JOR
Arabic (Kuwait) 13313 3401 1256 KWT
Arabic (Lebanon) 12289 3001 1256 LBN
Arabic (Libya) 4097 1001 1256 LBY
Arabic (Morocco) 6145 1801 1256 MAR
Arabic (Oman) 8193 2001 1256 OMN
Arabic (Qatar) 16385 4001 1256 QAT
Arabic (Saudi Arabia) 1025 0401 1256 SAU
Arabic (Syria) 10241 2801 1256 SYR
Arabic (U.A.E.) 14337 3801 1256 ARE
Farsi 1065 0429 1256 IRN
Urdu 1056 0420 1256 PAK
Arabic (Tunisia) 7169 1c01 1256 TUN
Estonian 1061 0425 1257 EST
Latvian 1062 0426 1257 LVA
Lithuanian 1063 0427 1257 LTU
Vietnamese 1066 042a 1258 VNM

This table was generated from information at List of Locale IDs and Language Groups for Microsoft Windows 2000

Definitions

Locale: A collection of language-related, user-preference information represented as a list of values. (Reference)

Locale ID (LCID): A 32-bit value defined by Microsoft Windows that consists of a language ID, sort ID, and reserved bits that identify a particular language.

Codepage: "An ordered set of characters in which a numeric index (code point values) is associated with each character. The first 128 characters of each codepage are functionally the same and include all characters needed to type English text. The upper 128 characters of OEM and ANSI codepages contain characters used in a language or group of languages (Taken from Related resources below)".

Character Encoding Recommendation for Language

IANA encoding Java Canonical Name Language Comment
UTF-8 UTF8 8bit Universal character set  
UTF-16 UTF-16 16bit Universal character set  
US-ASCII ASCII American Standard Code for Information Interchange  
windows-1250 Cp1250 Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) Windows encoding
windows-1251 Cp1251 Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian Windows encoding
windows-1252 Cp1252 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) Windows encoding
windows-1253 Cp1253 Greek Windows encoding
windows-1254 Cp1254 Turkish Windows encoding
windows-1255 Cp1255 Hebrew Windows encoding
windows-1256 Cp1256 Arabic Windows encoding
windows-1257 Cp1257 Baltic Windows encoding
windows-1258 Cp1258 Vietnamese Windows encoding
ISO-8859-1 ISO8859_1 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) Euro Symbol is not supported
ISO-8859-2 ISO8859_2 Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) Euro Symbol is not supported
ISO-8859-3 ISO8859_3 Southeastern European (Afrikaans, Catalan, Dutch, English, Esperanto, German, Italian, Maltese, Spanish, Turkish)  
ISO-8859-4 ISO8859_4 Northern European (Danish, English, Estonian, Finnish, German, Greenlandic, Latin, Latvian, Lithuanian, Norwegian, Sテ。mi, Slovenian, Swedish)  
ISO-8859-5 ISO8859_5 Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian)  
ISO-8859-6 ISO8859_6 Arabic  
ISO-8859-7 ISO8859_7 Greek  
ISO-8859-8 ISO8859_8 Hebrew  
ISO-8859-9 ISO8859_9 Western European (Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, English, Finnish, French, Frisian, Galician, German, Greenlandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, Turkish)  
ISO-8859-13 ISO8859_13 Baltic Rim (English, Estonian, Finnish, Latin, Latvian, Norwegian)  
ISO-8859-15 ISO8859_15 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) ISO-8859-1 with Euro symbol support
windows-31j MS932 Japanese Windows encoding
EUC-JP EUC_JP Japanese EUC encoding used on Unix platform
Shift_JIS SJIS Japanese Shift JIS, does not support MS external characters
ISO-2022-JP ISO2022JP Japanese JIS X 0201, 0208, in ISO 2022 form, this is used for e-mail
x-mswin-936 MS936 Simplified Chinese Windows encoding, This is not registered in IANA.
GB18030 GB18030 Simplified Chinese PRC standard
x-EUC-CN EUC_CN Simplified Chinese GB2312, EUC encoding
GBK GBK Simplified Chinese  
x-windows-949 MS949 Korean Windows encoding, this is not registered in IANA.
EUC-KR EUC_KR Korean KS C 5601, EUC encoding
x-windows-950 MS950 Traditional Chinese Windows encoding, this is not registered in IANA
x-MS950-HKSCS MS950_HKSCS Traditional Chinese with Hong Kong extensions Windows encoding, this is not registered in IANA
x-EUC-TW EUC_TW Traditional Chinese CNS11643 (Plane 1-3), EUC encoding, this is not registered in IANA
Big5 Big5 Traditional Chinese  
Big5-HKSCS Big5_HKSCS Traditional Chinese Big5 with Hong Kong extensions
TIS-620 TIS620 Thai
原文地址:https://www.cnblogs.com/fdyang/p/3032171.html