CJK is a collective term for
Chinese,
Japanese, and
Korean, which constitute the main
East Asian languages. The term is used
in the field of
software and communications
internationalization.
The term
CJKV means CJK plus
Vietnamese, which in the past used
Hán tự/
Chinese characters and
Chữ Nôm prior to adopting
Quốc Ngữ/
Latin
Alphabet.
These languages all have a shared characteristic: Their
writing systems all completely or partly use
Chinese characters —
hànzì in Chinese,
kanji in Japanese, and
hanja in
Korean. Chinese is written in Chinese characters only and requires
c. 4,000 characters for general
literacy although there
are up to 40,000 characters for reasonably complete coverage.
Japanese uses fewer characters — general
literacy in Japan can be expected with about 2,000
characters — together with two
syllabaries
(
hiragana and
katakana). The use of Chinese characters in Korea
is becoming increasingly rare altogether, although idiosyncratic
use of Chinese characters in proper names requires knowledge (and
therefore availability) of many more characters. The number of
characters required for complete coverage of all these languages'
needs cannot fit in the 256-character code space of 8-bit
character encodings, requiring at least a
16-bit fixed width encoding or multi-byte variable-length
encodings. The 16-bit fixed width encodings, such as
Unicode up to and including version 2.0, are now
deprecated due to the requirement to encode more characters than a
16-bit encoding can accommodate — Unicode 5.0 has some 70,000 Han
characters — and the requirement by the Chinese government that
software in China support the
GB18030
character set.
Although CJK encodings have common character sets, the encodings
often used to represent them have been developed separately by
different East Asian governments and software companies, and are
mutually incompatible.
Unicode has
attempted, with some controversy, to unify the character sets in a
process known as
Han
unification.
CJK character encodings should consist minimally of Han characters
plus language-specific phonetic scripts such as
pinyin,
bopomofo,
hiragana,
katakana, and
hangul.
CJK character encodings include:
The CJK character sets take up the bulk of the
Unicode code space. There is much controversy among
Japanese experts of Chinese characters about the desirability and
technical merit of the Han unification process used to map multiple
Chinese and Japanese character sets into a single set of unified
characters.
All three languages can be written both
left-to-right
and top-to-bottom, but are usually considered left-to-right
scripts when discussing encoding issues.
See also
References
- DeFrancis, John. The Chinese Language:
Fact and Fantasy. Honolulu: University of Hawaii Press,
1990. ISBN 0-8248-1068-6.
- Hannas, William C. Asia's Orthographic Dilemma.
Honolulu: University of Hawaii Press, 1997. ISBN 0-8248-1892-X
(paperback); ISBN 0-8248-1842-3 (hardcover).
- Lemberg, Werner: The CJK package
for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18
(1997), No. 3—Proceedings of the 1997 Annual Meeting
- Lunde, Ken. CJKV Information
Processing. Sebastopol, Calif.: O'Reilly & Associates,
1998. ISBN 1-56592-224-7.
External links