Differences between ANSI, ISO-8859-1 and MacRoman character sets

Differences between ANSI, ISO-8859-1 and MacRoman character sets

Of the three main 8-bit character sets, only ISO-8859-1 is produced by a standards organization. The three sets are identical for the 95 characters from 32 to 126, the ASCII character set. The ANSI character set, also known as Windows-1252, has become a Microsoft proprietary character set; it is a superset of ISO-8859-1 with the addition of 27 characters in locations that ISO designates for control codes. Apple’s proprietary MacRoman character set contains a similar variety of characters from 128 to 255, but with very few of them assigned the same numbers, and also assigns characters to the control-code positions.

The characters that appear in the first column of the following tables are generated from Unicode numeric character references, and so they should appear correctly in any Web browser that supports Unicode and that has suitable fonts available, regardless of the operating system.

  1. ANSI characters not present in ISO-8859-1
  2. ANSI characters not present in MacRoman
  3. ISO-8859-1 characters not present in ANSI
  4. ISO-8859-1 characters not present in MacRoman
  5. MacRoman characters not present in ANSI
  6. MacRoman characters not present in ISO-8859-1

这里是ANSI是指,Windows-1252,对应的code page是

1252 windows-1252 ANSI Latin 1; Western European (Windows)

This character encoding is a superset of ISO 8859-1 in terms of printable characters, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range. Notable additional characters include curly quotation marks and all the printable characters that are in ISO 8859-15 (at different places than ISO 8859-15). It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252". 

It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes引号 and apostrophes撇号 (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the media type charset ISO-8859-1 as Windows-1252 to accommodate such mislabeling. This is now standard behavior in the HTML5 specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[5]

Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be ANSI standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer用词不当 that continues to persist in the Windows community.

help chcp
Displays or sets the active code page number.

CHCP [nnn]

nnn Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

ANSI本来包含€,但是本地的code page是936,复制过去,不能在notepad++里面正常显示。所以notepad++里面的ANSI是指当前的code page。

Finding out the default character encoding in Windows

You can check with PowerShell:

[System.Text.Encoding]::Default

which even enables you to check that across several machines at once.

但是系统默认的编码是utf-8,对应的code page是65001

Preamble :
BodyName : utf-8
EncodingName : Unicode (UTF-8)
HeaderName : utf-8
WebName : utf-8
WindowsCodePage : 1200
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
IsSingleByte : False
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : True
CodePage : 65001

 65001 utf-8 Unicode (UTF-8)

65001 utf-8 Unicode (UTF-8)
原文地址:https://www.cnblogs.com/chucklu/p/14654158.html