Uyghur ( , or ; ), formerly known as Eastern Turki, is a Turkic language spoken in Xinjiang Uyghur Autonomous Region, a Central Asian region administered by Chinamarker, mainly by the Uyghur ethnic group. It is spoken by 10 million (2007) in Chinamarker, mostly in Xinjiang. Uyghur is also spoken by some 300,000 in Kazakhstanmarker, and there are Uyghur-speaking communities in Afghanistanmarker, Albaniamarker, Australia, Belgiummarker, Canadamarker, Germanymarker, Indonesiamarker, Kyrgyzstanmarker, Mongoliamarker, Pakistanmarker, Saudi Arabiamarker, Swedenmarker, Taiwanmarker, Tajikistanmarker, Turkeymarker, United Kingdommarker, USAmarker, and Uzbekistanmarker.

Like many other Turkic languages, Uyghur displays vowel harmony and agglutination, lacks noun classes or grammatical gender, and is a left-branching language with Subject Object Verb word order.


The Uyghur language belongs to the Uyghuric or Southeastern group of the Turkic language family, which is controversially a branch of the Altaic language family.

The languages most closely related to it include Uzbek, Ili Turki, and Aini. Some linguists consider the Turkic languages to be part of the larger Altaic language family, but others believe there is not enough evidence to support this.

Early linguistic scholarly studies of Uyghur include Julius Klaproth's 1812 Dissertation on language and script of the Uighurs (Abhandlung über die Sprache und Schrift der Uiguren) which was disputed by Isaak Jakob Schmidt. In this period, Klaproth correctly asserted that Uyghur was a Turkic language, while Schmidt believed that Uyghur should be classified with Tangut languages.


Old Uyghur or Old Turkic is an ancient form of Turkic used from the 7th to the 13th centuries in Mongoliamarker and the Uyghurstan/East Turkestan region, in particular in the Orkhon inscriptions and Turpan texts. It is the direct ancestor of the Southeastern Turkic, or Uyghur-Chaghatai, family of languages, including the modern Uyghur and Uzbek languages. By contrast, Yugur, although in geographic proximity, is more closely related to the northeastern Turkic languages in Siberiamarker.

During the 11th century, a scholar of the Turkic languages, Mahmud al-Kashgari (مەھمۇد قەشقىرى Memhud Qeshqeri) from Kashgarmarker in modern-day Xinjiang, published the first Turkic language dictionary and description of the geographic distribution of many Turkic languages Compendium of the Turkic Dialects (Divān-ul Lughat-ul Turk).

Old Uyghur, through the influence of Perso-Arabic after the 13th century, developed into the Chagatai language, a literary language used all across central Asia until the early 20th century. After Chaghatai fell into extinction, the standard versions of Uyghur and Uzbek were developed from dialects in the Chaghatai-speaking region, showing abundant Chaghatai influence. Uyghur language today shows considerable Persian influence as a result from Chaghatai, including numerous Persian loanwords. Modern Uyghur uses the Urumchimarker dialect in Xinjiang as its standard, while the similar Ili dialect is used in the former Soviet Union. Russian sources cite the central dialect of Ghuljamarker (Ili Kazakh Autonomous Prefecturemarker) as the pronunciation norm for modern Standard Uyghur. The similar pronunciation of Zhetysumarker and Ferganamarker Uyghurs is considered standard for Uyghurs living in the CIS countries.

Official status

The Uyghurs are one of the 56 official nationalities in China, and Uyghur is an official language of Xinjiang Uyghur Autonomous Region, along with Mandarin Chinese. Mandarin Chinese is not spoken widely in southern Xinjiang. However, due to the policy of mandatory Mandarin-language education for all of Xinjiang, knowledge of Mandarin is increasing, and it is becoming the predominant language of Xinjiang, displacing Uyghur as the lingua franca among the Turkic peoples of Xinjiang. About 80 newspapers and magazines are available in Uyghur; five TV channels and ten publishers serve as the Uyghur media. Outside of China, Radio Free Asia and TRT provide news in Uyghur.


The dialects of Uyghur include Central Uyghur, Hotan (Hetian), Lop (Luobu), and Akto Türkmen. In addition, each locality in the Xinjiang region speaks a dialect slightly difference from the next, such as Kashgarmarker, Qumulmarker, Turpanmarker, Ghuljamarker, etc. Vocalic assimilation or r-deletion is not common in the Kasghar dialect, while common elsewhere. Similarly, vowel reduction is not common in the southern dialects.



Labial Dental Post-

Velar Uvular Glottal

In Uyghur, any consonant phoneme can occur as the syllable onset or coda, except for which only occurs in the onset and , which never occurs word-initially. Uyghur syllable structure is usually CV or CVC, but CVCC can occur if the third element is a sonorant. In general, Uyghur phonology tends to simplify phonemic consonant clusters by means of elision and epenthesis.

Uyghur voiceless stops are aspirated word-initially and intervocalically. The pairs , , , and alternate, with the voiced member devoicing in syllable-final position, except in word-initial syllables. This devoicing process is usually reflected in the official orthography, but an exception has been recently made for certain Perso-Arabic loans. Voiceless phonemes do not become voiced in standard Uyghur.

Suffixes display a slightly different type of consonant alternation. The phonemes /g/ and /ʁ/ anywhere in a suffix alternate as governed by vowel harmony, where /g/ occurs with front vowels and /ʁ/ with back ones. Devoicing of a suffix-initial consonant can occur only in the cases of /d/ → [t], /g/ → [k], and /ʁ/ → [q], when the preceding consonant is voiceless. Lastly, the rule that /g/ must occur with front vowels and /ʁ/ with back vowels can be broken when either [k] or [q] in suffix-initial position becomes assimilated by the other due to the preceding consonant being such.

Stops and affricates lenite when preceding a dissimilar consonant. /t͡ʃ/ goes to [ʃ], /d͡ʒ/ to [ʒ], /k/ to [ç], and /q/ to [χ]. /g/ goes to /ɣ/ in word-initial syllables, but in non-initial syllables, /g/ and /ʁ/ behave like their unvoiced equivalents and go to [ç] and [χ] respectively. These changes are not reflected in orthography. Similarly, /h/ tends to become [χ] before another consonant. Lenition also occurs in certain intervocalic contexts,e.g. /b/ lenites to [β] and /g/ as [ɣ] (not marked).The only lenition process represented in orthography is /b/ to [β] as .

Uyghur displays vocalic assimilation, atypical among Turkic languages. Syllable-final /r/, /l/, and /j/ are optionally assimilated to the preceding vowel which is lengthened, in the case of e and u, made lower and less tense; e.g., xelqlar ‘the nations’. However, this never occurs when /l/ and /j/ are word final. This phenomenon occurs most common in colloquial speech, but is often avoided when reciting, reading, or singing. As a result, Uyghur speakers often hypercorrect by inserting an [r] after a long vowel where there is no phonemic /r/, especially after attaching a vowel-initial suffix. (e.g. bina 'building', binarim or binayim 'my building'. In addition, although this is not represented orthographically, a few cases of "r-deletion" have been lexicalized, such as تۆت töt "four".

Loan phonemes have influenced Uyghur to various degrees. /d͡ʒ/ and /x/ were borrowed from Arabic and have been nativized, while /ʒ/ from Persian less so. /f/ only exists in very recent Russian and Chinese loans, since Perso-Arabic (and older Russian and Chinese) /f/ became Uyghur /p/. Perso-Arabic loans have also made the contrast between /k, g/ and /q, ʁ/ phonemic, as they occur as allophones in native words, the former set near front vowels and the latter near a back vowels. Some speakers of Uyghur distinguish /v/ from /w/ in Russian loans, but this is not represented in most orthographies. Other phonemes occur natively only in limited contexts, i.e. /h/ only in few interjections, /d/, /g/, and /ʁ/ rarely initially, and /z/ only morpheme-final. Therefore, the pairs */t͡ʃ, d͡ʒ/, */ʃ, ʒ/, and */s, z/ do not alternate.


Front Back
Unrounded Rounded Unrounded Rounded
High i /i/ ü /y/ u /u/
Mid é /e/ ö /ø/ o /o/
Low e /ɛ/ a /a/

Uyghur vowels are by default short, but some phonologists have argued that long vowels also exist because of historical vowel assimilation (above) and through loanwords. Underlyingly long vowels would resist vowel reduction and devoicing, introduce non-final stress, and be analyzed as |Vj| or |Vr| before a few suffixes. However, the conditions in which they are actually pronounced as distinct from their short counterparts have not been fully researched

Official Uyghur orthographies do not mark vowel length, and also do not distinguish between (e.g., بىلىم /bilim/ "knowledge") and back (e.g., تىلىم /tɯlɯm/ "my language"); these two sounds are in complementary distribution, but phonological analyses claim that they play a role in vowel harmony and are separate phonemes.

Uyghur has systematic vowel reduction (or vowel raising) in which unrounded non-high vowels (/a/ and /ɛ/) in initial open syllables followed by are changed to and unrounded vowels in other non-final open syllables are changed to . The former process is applied before the latter; As with other phenomena, long vowels are exempt. For example:
→ (cf. Turkish alın) ‘take!’
→ (cf. Turkish atalarımız) ‘our fathers’ (not *etilirimiz in Uyghur because reduction to /e/ can only be applied before reduction to /i/ in a word)
→ (cf. Turkish atım) ‘my horse’)
→ 'my feather'

The high vowels /i/, /u/, and /y/ are devoiced in non-stressed positions when they occur between two voiceless consonants, or in word-initial position before a voiceless consonant: e.g. uka 'older brother', pütün 'entire', ikki .

/e/ only occurs in loanwords and as the result of vowel raising.

Vowel harmony

Uyghur, like other Turkic languages, displays vowel harmony. Words usually agree in vowel backness, but compounds, loans, and some other exceptions often break vowel harmony. Suffixes surface with the rightmost [back] value in the stem, and /e, i/ are transparent (as they don't contrast for backness).

Uyghur also has rounding harmony.


Syllable structure is CV(C)(C).


The Uyghur language has used numerous orthographies from the time of its ancestor Old Turkic, but the most historically prominent were the Sogdian, Orkhon-Yenisei, Old Uyghur, and Arabic scripts. Uyghur was also written in the Syriac alphabet in some Christian documents.

The Sogdian-derived Old Uyghur alphabet, used from the 9th to 13th centuries and the predecessor of both the Mongolian and Manchu scripts. However, as the Persian-influenced Chaghatai language developed, a Perso-Arabic derived script was used, and remained in continuous use until the 1920s. In the 20th century, various orthographies have been imposed on Uyghur by the Sovietmarker and Chinesemarker governments.

Today, Uyghur uses four orthographies, the Uyghur Arabic script, the Cyrillic script, the Uyghur Pinyin Script, and the Uyghur Latin script.

IPA a ɛ b p t d͡ʒ t͡ʃ x d r z ʒ s ʃ f ʁ q k g ŋ l m n h o u ø y w e i j ʔ
Kona Yéziq ا ە ب پ ت ج چ خ د ر ز ژ س ش ف غ ق ك گ ڭ ل م ن ھ و ۇ ۆ ۈ ۋ ې ى ي
ULY a e b p t j ch x d r z j/zh s sh f gh q k g ng l m n h o u ö ü w é i y (')
Yéngi Yéziq a ə b p t j q h d r z s x f ƣ k k g ng l m n h o u ɵ ü w e i y (')
Cyrillic a ə б п t җ ч x д р з ж с ш ф ғ қ к г ң л м н һ о у ө ү в e и й (’)
  • In the Arabic orthography, /ʔ/ is always represented, even word-initially. The Pinyin- and Cyrillic-based orthographies only represent it word-medially using an apostrophe, but its use is not consistent. The new Latin orthography uses an apostrophe to represent /ʔ/ between a consonant and a vowel (the only case where confusion can result) as well as to avoid confusion when the sequences sh, ng, and gh represent two phonemes (Is'haq 'Isaac', bashlan'ghuch 'initial', 'beginning'.
  • The Cyrillic orthography uses ю and я to represent /ju/ and /ja/. Also additional letters for Russian loanwords are used, exactly spelled as in Russian.


The Arabic-derived script (Kona Yéziq 'Old Script') is actually a true alphabet, with all vowels fully represented. It was used in its old form from the late middle ages until the 1920s. Used in the Perso-Arabic Chaghatai script, Arabic-specific letters were removed in 1937, and an expanded vowel system was instituted in 1954. It was officially replaced by the Pinyin script in Xinjiang during the Cultural Revolution, but was still used privately. A new version of Kona Yéziq with separate vowels representing /ø/ and /y/ was reinstated in the 1980s and is still widely used.


The Pinyin-derived Latinate script (Yéngi Yéziq 'New Script') was instituted by China in the 1960 and '70s during the Cultural Revolution, but it never gained widespread acceptance, due to cultural resistance, feelings of estrangement from other Turkic peoples, fear of linguistic assimilation, and "reform fatigue". However, it still remains an "authorized option" according to the Chinese government.

One noteable feature of Yéngi Yéziq was its inclusion of the letter ƣ for /ʁ/, which has been erroneously named LATIN LETTER OI in unicode (it is correctly referred to as gha).

Glyph Unicode Name

By 2000, a number of unofficial Latin-based orthographies were in use online, based on various influences such as Turkish, German, English, and Chinese.

The new Latin-derived script (Uyghur Latin Yéziqi 'Uyghur Latin Script' - ULY) was a result of several conferences in 2000 and 2001 at Xinjiang University in Ürümchi. Despite cautious media reporting, the ULY is still surrounded by controversy and disagreement. It was not intended to replace the Uyghur Arabic script but be only used in "computer related fields". Despite cautious media reporting, the ULY is still surrounded by controversy and disagreement. Even so, it has become a popular alternative to the Uyghur Arabic script online.

The ULY generally has a one-to-one correspondance with the Arabic script, but after many opposing arguments regarding the representation of /ʒ/ it was decided that both and would be acceptable variants.


The Cyrillic-derived script (Siril Yéziqi 'Cyrillic Script') was instituted by the Soviet Union and briefly by China, and remains in use today mainly in the former Soviet Union, i.e. Kazakhstanmarker and Uzbekistanmarker. Soviet Uyghurs were made to adopt a Latin script in the 1920s, in a decision motivated by Stalin's Korenizatsiya policy, but the Soviet Union shifted its attitude after Turkey adopted the Latin alphabet, fearing a "pan-Turkic threat", and switched to imposing Cyrillic, which is still used in Russian-dominated areas. In its early days, China was heavily influenced by the USSR's language policy, and made a short-lived attempt to impose a version of the Soviet Cyrillic Uyghur script, but when relations with the USSR became strained, China decided not to create too many links between Uyghurs straddling the border.


Uyghur has Subject Object Verb word order, genitives, adjectives, numerals, relatives before noun heads, postpositions and initial question words. Uyghur uses suffixes and displays agglutination, but has a few Persian-derived prefixes. Word order distinguishes subjects and indirect objects, topic and comment. There are eight noun cases marked by suffixes. Verb suffixes mark person, number, 2nd person marks plural and three levels of respect. Types of verbs include passive, reflexive, reciprocal and causative.


The core of Uyghur vocabulary is of Turkic stock. However, Uyghur does have numerous borrowings. Due the influence of the literary Chaghatai language, Uyghur, like Uzbek, has borrowed a large quantity of loans from Persian and Arabic. Such words are less phonologically assimilated into Uyghur than Perso-Arabic borrowings in other Turkic languages such as Kyrgyz, and do not usually follow vowel harmony (e.g. 'sin': Persian gunāh, Uyghur günah, Kyrgyz günö).

In addition to Perso-Arabic loans, Uyghur Russian and Chinese loanwords, mostly in the area of modern vocabulary, technology, and government. Russian loans are more common in the former Soviet Union, where, due to Sovietmarker language policy, they are spelled exactly as in Russian; in Xinjiang, Russian loanwords are often somewhat phonologically assimilated, depending on when the word was borrowed. A few older Chinese loans have been borrowed before 1949, but the majority of Chinese loanwords today are recent borrowings, as the younger generation becomes more fluent in Mandarin and as recent language policy has reduced the status of Uyghur.
Example loanwords
Language Source word Source IPA Uyghur word Uyghur IPA English
Persian افسوس epsus ئەپسۈس pity
گوشت gösh گۆش meat
Arabic ساعة saet سائەت hour
Russian велосипед wélsipit ۋېلسىپىت bicycle
доктор doxtur دوختۇر doctor (medical)
поезд poyiz پويىز train
область oblast ئوبلاست /oblast/ oblast, region
телевизор téléwizor تېلېۋىسور television set
Chinese 电视 diànshì dyenshi ديەنشى
桌子 zhūozi joza جوزا table

Text sample

Here the sample of Universal Declaration of Human Rights (Article 1) in Uyghur:
ھەممە ئادەم زانىدىنلا ئەركىن، ئىززەت-ھۆرمەت ۋە ھوقۇقتا باپباراۋەر بولۇپ تۇغۇلغان. ئۇلار ئەقىلغە ۋە ۋىجدانغا ئىگە ھەمدە بىر-بىرىگە قېرىنداشلىق مۇناسىۋىتىگە خاس روھ بىلەن موئامىلە قىلىشى كېرەك.
ULY Hemme adem zatidinla erkin, izzet-hörmet we hoquqta babbarawer bolup tughulghan. ular eqilghe we wijdan'gha ige hemde bir-birige qérindashliq munasiwitige xas roh bilen muamile qilishi kérek.
Cyrillic Uyghur
English All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

