Transcript Unicode 3.0
Unicode 3.0.1 Mark Davis www.macchiato.com New 3.0 Characters Category V 2.1 V 3.0 Alphabetics, Symbols 6,511 10,236 CJK Ideographs 21,204 27,786 Hangul Syllables 11,172 11,172 Assigned characters 38,887 49,194 Unassigned code values 18,134 7,827 Sync’ed with ISO/IEC 10646, 2nd edition Unicode 3.0 New 3.0 Blocks 80 Syriac 176 Mongolian 192 Thaana 256 Braille 128 Sinhala 128 CJK Rad. Sup. 160 Myanmar 224 Kangxi Rad. 384 Ethiopic 16 Ideo. Desc. 96 Cherokee 640 U.C. Ab. Syl. 32 Ogham 32 Bopomofo Ext. 6,582 CJK Ideo. A 1,168 Yi Syllables 96 Runic 64 Yi Radicals 128 Khmer Unicode 3.0 Property Updates (1) Bidirectional properties Byte order mark Capital letters with iota adscript Case Combining classes Decompositions Unicode 3.0 Property Updates (2) Identifier Syntax Layout controls Linebreak properties East-Asian width properties Misc. Characters: Figure Space, Tilde,… Ligature Control Unassigned Code Points Unicode 3.0 Conformance Unicode Transformation Formats UTF-16BE, UTF-16LE, UTF-16, UTF-8 Unicode Bidirectional Behavior Other normative character property values Clause numbering maintained! Stability Policies Clarification of noncharacters Normalization Conformance Test Unicode 3.0 Unicode Standard Annexes (UAX) Integral part of 3.0.1 Standard • UAX #09: BIDI • UAX #11: East Asian Width • UAX #13: Newline Guidelines • UAX #14: Line Breaking • UAX #15: Normalization • Included in any reference to version 3.0 or later Unicode 3.0 Unicode Technical Standards (UTS) • UTS #06: Compression – IANA name: SCSU • UTS #10: Collation – Note: defined over all Unicode code points – Values will be updated soon for better ordering Unicode 3.0 Technical Reports • UTR #07: Language Tags • UTR #16: UTF-EBCDIC UTR #17: Character Encoding Model UTR #18: Regular Expressions UTR #19: UTF-32 UTR #21: Case Mappings Unicode 3.0 Draft Technical Reports UTR #20: Unicode in XML… UTR #22: Character Mapping Tables UTR #24: Script Names • Open for public comment Unicode 3.0 Unicode Character Database • More Documentation, More Data – UnicodeData Blocks – ArabicShaping Jamo – CompositionExclusions SpecialCasing – EastAsianWidth LineBreak – Unihan BidiMirroring – CaseFolding NormalizationTest Unicode 3.0 Website changes • New Look & Feel • New Navigation • Enhanced FAQ • Glossary • What is Unicode? • Where is my character? Unicode 3.0 Beyond 3.0 • Characters – CJK characters, symbols, music systems, ancient scripts, extra characters, etc. – First allocated surrogate pairs • Properties – essential for Unicode enablement Unicode 3.0 Unicode 3.0 • Major new version • Over 10,000 new characters • Enhanced character data for implementations • Reorganized text for better reference • The version for normalization • Unicode Character Database 3.0.0 • Available now! Unicode 3.0 Q&A Unicode 3.0 Backup Slides Unicode 3.0 ICU: Paid Advertisement • Open Source Unicode Enablement Library – ICU: C/C++ and Java Versions – IBM Public License – Friday, 10:00 Helena Shih • http://oss.software.ibm.com/icu Unicode 3.0 Enumerated Versions • Unicode 1.0.0, Unicode 1.0.1 • Unicode 1.1.0, Unicode 1.1.5 • Unicode 2.0.0 • Unicode 2.1.2, Unicode 2.1.5, Unicode 2.1.8, Unicode 2.1.9 • Unicode 3.0.0 – www.unicode.org Unicode 3.0 Editorial Committee • Joan Aliprand • John Jenkins • Julie Allen (editor) • Mike Ksar • Joe Becker • Rick McGowan • Mark Davis • Lisa Moore • Asmus Freytag • Ken Whistler Unicode 3.0 New Characters (2) Category V 2.1 V 3.0 Private Use 6,400 6,400 Surrogates 2,048 2,048 65 65 2 2 Assigned code values 47,402 57,709 Unassigned code values 18,134 7,827 Controls Not Characters Unicode 3.0 Reference to Versions • Open repertoire, but backwards compatible • Characters only added, not removed – Two early exceptions: ISO sync. & Korean • Don’t overspecify the version: – “Version 2.1.0” vs. “Version 2.1” vs. “Version 2 or later” • Includes Technical Reports!! Unicode 3.0 Versions of the Standard • major - significant additions – published as a book • minor - character additions or more significant normative changes – published as a Technical Report • update - any other changes – on the website in /standard/versions/ • Example: 2.1.9 Unicode 3.0 Unicode 3.0 • Versioning • Technical Reports • Characters • Properties • Unicode Character Database • Conformance • Future Unicode 3.0 Reorganized Text 6: Punctuation 7: European Alphabetics 8: Middle Eastern 9: South Asian 10: East Asian 11: Other (Mongolian, etc.) 12: Symbols 13: Formatting, Controls, Specials Unicode 3.0 Additionally • Shift-JIS Index • Full Radical Stroke Index – CJK split in several blocks • Improved Charts – Especially for CJK Ideographs • Improved Implementation Guidelines • General Clarifications Unicode 3.0