The Full Wiki

Windows-1252: Map

Advertisements
  
  

Wikipedia article:

Map showing all locations mentioned on Wikipedia article:

Not to be confused with ASCII


Windows-1252 ("ANSI" character encoding) is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages. In LaTeX packages, it is referred to as ansinew. The encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 0x80 to 0x9F range. It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252". This code page also contains all the printable characters that are in ISO 8859-15 (though some are mapped to different code points).

The use of Unicode (often in UTF-8 form) is slowly replacing use of 8-bit "code pages" such as Windows-1252.

Many web browsers and email clients treat the MIME charset ISO-8859-1 as Windows-1252, and so codes from it are often seen in web pages that declare their encoding erroneously as ISO-8859-1. However, there can be difficulties from the use of such characters, particularly when the recipient is using a non-Windows system, or systems that attempt to fully implement the standards.The latest version of draft HTML 5 specification codifies the substitution practice, requiring that ISO-8859-1 labeled documents be parsed as Windows-1252.

The term "ANSI code page" is also used to refer to code pages used in Windows, like Windows-1252. Even though Windows-1252 is considered an ANSI code page in Microsoft Windows parlance, the code page has never been standardized by ANSI. The name has been taken from an early ANSI draft, that later was modified and became ISO-8859-1. Thus, Windows-1252 is a non-standard code page and is called an ANSI code page for historical reasons. Microsoft has stated that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."

Codepage layout

The following table shows Windows-1252, with differences from ISO-8859-1 marked with thick borders and asterisks (*). Each character is shown with its Unicode equivalent right below and its decimal code at the bottom.

Microsoft cites Unicode mappings of windows 1252 with "best fit", which includes a few code points not listed below.



     
   


Legend: yellow cells are control characters, blue cells are punctuation, purple cells are numbers, green cells are ASCII letters, and tan cells are international letters.

According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused. However the Windows API call for converting from code pages to Unicode maps these to the corresponding C1 control codes. The euro character at position 80 was not present in earlier versions of this code page, nor were the S and Z with caron (háček).

In US English Windows, the characters from Windows-1252 can be inserted by holding down the Alt key and entering a zero followed by the character's three-digit decimal code on the numpad; in any other language version it is also possible after adding one of these languages and then changing to it while typing the code. (By omitting the zero one can also enter characters from the older code page 437 in this way.)

In other Western European versions of Windows (e.g. Italian, British English, Dutch, French, German, Portuguese, Spanish), entering of the Windows-1252 characters will work in the same way, yet when omitting the leading zero, characters from alternative code page 850, rather than 437, will usually be entered.

In other versions of Windows, it is also possible to enter the characters from the Windows-1252 code page. To achieve this, it is necessary to add one of the Western European languages first, and then change to this language before entering the code. Many Microsoft programs, such as Word create Windows-1252 characters automatically, even when none were entered.

Mozilla software and Windows-1252

Web pages and e-mails encoded with Windows-1252 are often (such as on Macintosh computers which use a different mapping of the x80-x9f codepoints for ISO-8859-1 marked text) displayed incorrectly if the web-page or e-mail does not correctly indicate the code-page in its headers. The code page can be specified for individual web-pages or e-mails by selecting "View --> Character Encoding --> Western (Windows-1252)" from the menu for Mozilla Firefox, Mozilla Thunderbird, and other Mozilla Foundationmarker products. This problem originates from a decision by the Mozilla developers to conform to broad standards (such as MIME and ISO8859-1) in this case, and ignore the influence Microsoft have had on many actual web pages.

This problem is often apparent when a large number of this symbol is present in a web-page: . However, these symbols don't always disappear with the solution presented here, as it could be caused by any kind of character encoding mismatch. Encoding mismatches for Windows-1252 are by far the most common for English language web-pages, though.

See also



External links



References

  1. HTML 5 Draft Recommendation, last version, 2.8 Character encoding, retrieved on 5 June 2009.
  2. http://blogs.msdn.com/oldnewthing/archive/2004/05/31/144893.aspx
  3. Unicode mappings of windows 1252 with "best fit"



Embed code:
Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message