Today you’ll learn about the standards and peculiarities of the word count in oriental languages. I made my mind up to write about them separately since they differ from others greatly.
The writing unit in Chinese is hieroglyph. The main difficulty for word count is that hieroglyphs are not separated with spaces. This means that the Chinese sentence «这是鸟» (This is a bird—3 words) is counted as a single word, in case the word count tool counts words basing on the spaces between words.
But if you think that these 3 hieroglyphs «工业化» are also a separate sentence, then you are wrong, since this is just an “industrialization.” So the most logical method of text volume evaluation in Chinese is character count. The experience of professional translators is that 1000-word English text translated into Chinese will be 1300-1800 characters long. 1000-character Chinese text translated into English will be 650-750 words long.
Japanese is a mixture of three main systems—hieroglyphs and two syllabaries: hiragana and katakana. This makes word count even more complicated than in Chinese. So a usual word count scheme in Japanese is based on characters without spaces, which seems quite logical.
Modern Korean is written with spaces between words (unlike Chinese or Japanese). Traditionally, Korean was written in columns from top to bottom, right to left, but is now usually written in rows from left to right, top to bottom. This means that the traditional word count scheme, when a word is counted on a spacing basis, can be applied.
The only East Asian language except mentioned above that has no spaces is Thai, so the job estimate is done basing on the character count. The rest languages, including all the Indian languages (Bengali, Gujarati, Marathi, Urdu, Orya, Tamil, etc), Indonesian, Farsi, Arabic, Turkish, and Hebrew utilize spacing, which means that words can be easily counted with a word count tool.
To sum up. Languages that don’t have spacing and require character count include Chinese, Japanese, and Thai. The rest oriental language utilizes spacing and enjoys word count instead of character count.