Page d'accueil encyclopedie-enligne.com en page d'accueil
Liste Articles: [0-A] [A-C] [C-F] [F-J] [J-M] [M-P] [P-S] [S-Z] | Liste Catégories | Une page au hasard | Pages liées

ISO 8859


ISO 8859, également appelé plus formellement ISO/IEC 8859, est un standard commun de l'ISO et de l'IEC d'encodage de caractères (character encoding) sur 8 bits pour le traitement informatique du texte. Le standard est divisé en parties numérotées publiées séparément, telles que ISO/IEC 8859-1, ISO/IEC 8859-2, etc., chacune pouvant être référencée de façon informelle en tant que standard en tant que tel. Le standard comprend actuellement 15 parties.

Sommaire

Introduction

Alors que les 96 caractères imprimables ASCII sont suffisants pour échanger des informations en anglais courant, la plupart des autres langues utilisant l'alphabet romain ont besoin de symboles additionnels non couverts par l'ASCII, tels que ß (allemand), å (suédois et d'autres langues nordiques). ISO 8859 a cherché à remédier à ce problème en utilisant le huitième bit, alors inutilisé en ASCII, pour donner de la place à 128 caractères supplémentaires. Cependant, plus de caractères étaient nécessaires pour parvenir à cette fin par rapport à ce qu'aurait permis un unique encodage de caractères sur 8 bits, aussi plusieurs tables de correspondances ont été développées, en incluant au moins 10 tables pour couvrir uniquement le script latin avec les langues modernes officielles.

Le standard ISO 8859-n n'est pas totalement identique aux encodages de caractères bien connus ISO-8859-n approuvés par l'IANA pour l'utilisation sur l'Internet. Au-delà du trait d'union ajouté présent dans le nom approuvé par l'IANA, les encodages diffèrent de sorte que chaque partie du standard ISO assigne, au maximum, 191 caractères dans les étendues d'octets 32 à 126 et 160 à 255, alors que l'encodage de caractère correspondant approuvé par l'IANA fusionne ces tables de correspondances avec le jeu de contrôle C0 (caractères de contrôles positionnés aux octets de 0 à 31) et le jeu de contrôle C1 (caractères de contrôles positionnés aux octets de 127 à 159), ce qui conduit à une table de caractères 8 bits complète, avec la plupart, sinon la totalité, des valeurs d'octets assignées.

Caractères

Le standard ISO 8859 est conçu pour l'échange fiable d'informations, et non pour la typographie; le standard omet des symboles nécessaires pour la typographie de haute qualité, telles que les ligatures optionnelles, les marques de citation incurvées, les tirets, etc. de ce fait, les systèmes de composition avancés utilisent souvent des extensions propriétaires ou pointilleuses au-delà des standards ASCII et ISO 8859, ou utilisent plutôt Unicode.

As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it didn't get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks and used for English and some other languages. French didn't get its œ and Œ ligatures because French speakers had not previously needed them enough to demand them on their keyboards; nor did it get Ÿ, because this character is only used in French in all caps text. These characters were, however, included later with ISO 8859-15, which also introduced the new Euro character . Likewise Dutch did not get the 'ij' and 'IJ' letters, because Dutch speakers had gotten used to typing these as two letters instead. Romanian did not initially get its 'Ș/ș' and 'Ț/ț' letters, because these letters were initially unified with 'Ş/ş' and 'Ţ/ţ' by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO 8859-16.

Most of the ISO 8859 encodings provide diacritic marks required for various European languages. Others provide non-Roman alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. However, the standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions either; Japanese syllabic Kana scripts, on the other hand, might, but like several other alphabets of the world isn't encoded.

Les parties du standard ISO 8859

ISO 8859 est constitué à ce jour des parties suivantes:

s*ISO 8859-6 (arabe) — Covers the most common Arabic glyphs, although not nearly all of them.

¹: only the IJ/ij (Dutch Y) is missing, which can be represented as IJ
²: missing characters are in ISO 8859-15

Each part of ISO 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all its seven special chars at the same positions in all Latin variants (1-4, 9-10, 13-16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1-4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.

Table

Comparaison des diverses parties de ISO 8859
binaire Oct Déc Hex 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
10100000 240 160 A0 NBSP
10100001 241 161 A1 ¡ Ą Ħ Ą Ё ʽ ¡ Ą " ¡ Ą
10100010 242 162 A2 ¢ ˘ ˘ ĸ Ђ ʼ ¢ ¢ Ē ¢ ¢ ą
10100011 243 163 A3 £ Ł £ Ŗ Ѓ £ £ £ Ģ £ £ £ Ł
10100100 244 164 A4 ¤ ¤ ¤ ¤ Є ¤ ¤ ¤ Ī ¤ Ċ
10100101 245 165 A5 ¥ Ľ Ĩ Ѕ ¥ ¥ Ĩ " ċ ¥ "
10100110 246 166 A6 ¦ Ś Ĥ Ļ І ¦ ¦ ¦ Ķ ¦ Š Š
10100111 247 167 A7 § § § § Ї § § § § § § § §
10101000 250 168 A8 ¨ ¨ ¨ ¨ Ј ¨ ¨ ¨ Ļ Ø š š
10101001 251 169 A9 © Š İ Š Љ © © © Đ © © © ©
10101010 252 170 AA ª Ş Ş Ē Њ × ª Š Ŗ ª Ș
10101011 253 171 AB « Ť Ğ Ģ Ћ « « « Ŧ « « «
10101100 254 172 AC ¬ Ź Ĵ Ŧ Ќ ، ¬ ¬ ¬ Ž ¬ ¬ Ź
10101101 255 173 AD ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­
10101110 256 174 AE ® Ž Ž Ў ® ® Ū ® ® ® ź
10101111 257 175 AF ¯ Ż Ż ¯ Џ ¯ Ŋ Æ Ÿ ¯ Ż
10110000 260 176 B0 ° ° ° ° А ° ° ° ° ° ° °
10110001 261 177 B1 ± ą ħ ą Б ± ± ± ą ± ± ±
10110010 262 178 B2 ² ˛ ² ˛ В ² ² ² ē ² Ġ ² Č
10110011 263 179 B3 ³ ł ³ ŗ Г ³ ³ ³ ģ ³ ġ ³ ł
10110100 264 180 B4 ´ ´ ´ ´ Д ΄ ´ ´ ī " Ž Ž
10110101 265 181 B5 µ ľ µ ĩ Е ΅ µ µ ĩ µ µ "
10110110 266 182 B6 ś ĥ ļ Ж Ά ķ
10110111 267 183 B7 · ˇ · ˇ З · · · · · · ·
10111000 270 184 B8 ¸ ¸ ¸ ¸ И Έ ¸ ¸ ļ ø ž ž
10111001 271 185 B9 ¹ š ı š Й Ή ¹ ¹ đ ¹ ¹ č
10111010 272 186 BA º ş ş ē К Ί ÷ º š ŗ º ș
10111011 273 187 BB » ť ğ ģ Л ؛ » » » ŧ » » »
10111100 274 188 BC ¼ ź ĵ ŧ М Ό ¼ ¼ ž ¼ Œ Œ
10111101 275 189 BD ½ ˝ ½ Ŋ Н ½ ½ ½ ½ œ œ
10111110 276 190 BE ¾ ž ž О Ύ ¾ ¾ ū ¾ Ÿ Ÿ
10111111 277 191 BF ¿ ż ż ŋ П ؟ Ώ ¿ ŋ æ ¿ ż
11000000 300 192 C0 À Ŕ À Ā Р ΐ À Ā Ą À À À
11000001 301 193 C1 Á Á Á Á С ء Α Á Á Į Á Á Á
11000010 302 194 C2 Â Â Â Â Т آ Β Â Â Ā Â Â Â
11000011 303 195 C3 Ã Ă Ã У أ Γ Ã Ã Ć Ã Ã Ă
11000100 304 196 C4 Ä Ä Ä Ä Ф ؤ Δ Ä Ä Ä Ä Ä Ä
11000101 305 197 C5 Å Ĺ Ċ Å Х إ Ε Å Å Å Å Å Ć
11000110 306 198 C6 Æ Ć Ĉ Æ Ц ئ Ζ Æ Æ Ę Æ Æ Æ
11000111 307 199 C7 Ç Ç Ç Į Ч ا Η Ç Į Ē Ç Ç Ç
11001000 310 200 C8 È Č È Č Ш ب Θ È Č Č È È È
11001001 311 201 C9 É É É É Щ ة Ι É É É É É É
11001010 312 202 CA Ê Ę Ê Ę Ъ ت Κ Ê Ę Ź Ê Ê Ê
11001011 313 203 CB Ë Ë Ë Ë Ы ث Λ Ë Ë Ė Ë Ë Ë
11001100 314 204 CC Ì Ě Ì Ė Ь ج Μ Ì Ė Ģ Ì Ì Ì
11001101 315 205 CD Í Í Í Í Э ح Ν Í Í Ķ Í Í Í
11001110 316 206 CE Î Î Î Î Ю خ Ξ Î Î Ī Î Î Î
11001111 317 207 CF Ï Ď Ï Ī Я د Ο Ï Ï Ļ Ï Ï Ï
11010000 320 208 D0 Ð Đ Đ а ذ Π Ğ Ð Š Ŵ Ð Ð
11010001 321 209 D1 Ñ Ń Ñ Ņ б ر Ρ Ñ Ņ Ń Ñ Ñ Ń
11010010 322 210 D2 Ò Ň Ò Ō в ز Ò Ō Ņ Ò Ò Ò
11010011 323 211 D3 Ó Ó Ó Ķ г س Σ Ó Ó Ó Ó Ó Ó
11010100 324 212 D4 Ô Ô Ô Ô д ش Τ Ô Ô Ō Ô Ô Ô
11010101 325 213 D5 Õ Ő Ġ Õ е ص Υ Õ Õ Õ Õ Õ Ő
11010110 326 214 D6 Ö Ö Ö Ö ж ض Φ Ö Ö Ö Ö Ö Ö
11010111 327 215 D7 × × × × з ط Χ × Ũ × × Ś
11011000 330 216 D8 Ø Ř Ĝ Ø и ظ Ψ Ø Ø Ų Ø Ø Ű
11011001 331 217 D9 Ù Ů Ù Ų й ع Ω Ù Ų Ł Ù Ù Ù
11011010 332 218 DA Ú Ú Ú Ú к غ Ϊ Ú Ú Ś Ú Ú Ú
11011011 333 219 DB Û Ű Û Û л Ϋ Û Û Ū Û Û Û
11011100 334 220 DC Ü Ü Ü Ü м ά Ü Ü Ü Ü Ü Ü
11011101 335 221 DD Ý Ý Ŭ Ũ н έ İ Ý Ż Ý Ý Ę
11011110 336 222 DE Þ Ţ Ŝ Ū о ή Ş Þ Ž Ŷ Þ Ț
11011111 337 223 DF ß ß ß ß п ί ß ß ฿ ß ß ß ß
11100000 340 224 E0 à ŕ à ā р ـ ΰ א à ā ą à à à
11100001 341 225 E1 á á á á с ف α ב á á į á á á
11100010 342 226 E2 â â â â т ق β ג â â ā â â â
11100011 343 227 E3 ã ă ã у ك γ ד ã ã ć ã ã ă
11100100 344 228 E4 ä ä ä ä ф ل δ ה ä ä ä ä ä ä
11100101 345 229 E5 å ĺ ċ å х م ε ו å å å å å ć
11100110 346 230 E6 æ ć ĉ æ ц ن ζ ז æ æ ę æ æ æ
11100111 347 231 E7 ç ç ç į ч ه η ח ç į ē ç ç ç
11101000 350 232 E8 è č è č ш و θ ט è č č è è è
11101001 351 233 E9 é é é é щ ى ι י é é é é é é
11101010 352 234 EA ê ę ê ę ъ ي κ ך ê ę ź ê ê ê
11101011 353 235 EB ë ë ë ë ы ً λ כ ë ë ė ë ë ë
11101100 354 236 EC ì ě ì ė ь ٌ μ ל ì ė ģ ì ì ì
11101101 355 237 ED í í í í э ٍ ν ם í í ķ í í í
11101110 356 238 EE î î î î ю َ ξ מ î î ī î î î
11101111 357 239 EF ï ď ï ī я ُ ο ן ï ï ļ ï ï ï
11110000 360 240 F0 ð đ đ ِ π נ ğ ð š ŵ ð đ
11110001 361 241 F1 ñ ń ñ ņ ё ّ ρ ס ñ ņ ń ñ ñ ń
11110010 362 242 F2 ò ň ò ō ђ ْ ς ע ò ō ņ ò ò ò
11110011 363 243 F3 ó ó ó ķ ѓ σ ף ó ó ó ó ó ó
11110100 364 244 F4 ô ô ô ô є τ פ ô ô ō ô ô ô
11110101 365 245 F5 õ ő ġ õ ѕ υ ץ õ õ õ õ õ ő
11110110 366 246 F6 ö ö ö ö і φ צ ö ö ö ö ö ö
11110111 367 247 F7 ÷ ÷ ÷ ÷ ї χ ק ÷ ũ ÷ ÷ ś
11111000 370 248 F8 ø ř ĝ ø ј ψ ר ø ø ų ø ø ű
11111001 371 249 F9 ù ů ù ų љ ω ש ù ų ł ù ù ù
11111010 372 250 FA ú ú ú ú њ ϊ ת ú ú ś ú ú ú
11111011 373 251 FB û ű û û ћ ϋ û û ū û û û
11111100 374 252 FC ü ü ü ü ќ ό ü ü ü ü ü ü
11111101 375 253 FD ý ý ŭ ũ § ύ ı ý ż ý ý ę
11111110 376 254 FE þ ţ ŝ ū ў ώ ş þ ž ŷ þ ț
11111111 377 255 FF ÿ ˙ ˙ ˙ џ ÿ ĸ ÿ ÿ ÿ

At position 0xA0 there's always the non breaking space and 0xAD is mostly the soft hyphen, which only shows at line breaks. Other empty fields are either unassigned or the system used isn't able to display them.

Relations avec Unicode et UCS

Since 1991, the Unicode Consortium has been working with ISO to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. This pair of standards was created to unify the ISO 8859 character repertoire, among others, by assigning each character, initially, to a 16-bit code value, with some code values left unassigned. Over time, their models adapted to map characters to abstract numeric code points rather than fixed bit-width values, so that more code points and encoding methods could be supported.

Unicode and ISO/IEC 10646 currently assign about 100,000 characters to a code space consisting of over a million code points, and they define several standard encodings that are capable of representing every available code point. The standard encodings of Unicode and the UCS use sequences of one to four 8-bit code values (UTF-8), sequences of one or two 16-bit code values (UTF-16), or one 32-bit code value (UTF-32 or UCS-4). There is also an older encoding that uses one 16-bit code value (UCS-2), capable of representing one-seventeenth of the available code points. Of these encoding forms, only UTF-8's byte sequences are in a fixed order; the others are subject to platform-dependent byte ordering issues that may be addressed via special codes or indicated via out-of-band means.

Newer editions of ISO 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO-8859-1.

ISO 8859 was favored throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms.

As the relative cost, in computing resources, of using more than one byte per character began to diminish, programming languages and operating systems added native support for Unicode alongside their system of code pages. As Unicode-enabled operating systems became more widespread, ISO 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from the simpler encodings, when necessary.

État de développement

The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining Working Group, WG 2, is concentrating on development of ISO/IEC 10646.

Références




This site support the Wikimedia Foundation. This Article originally from Wikipedia. All text is available under the terms of the GNU Free Documentation License Page HistoryOriginal ArticleWikipedia