Word Count Blog

June 28, 2009

Word Count in Oriental Languages

Today you’ll learn about the standards and peculiarities of the word count in oriental languages. I made my mind to write about them separately, since they differ from others greatly.

Chinese. Writing unit in Chinese is hieroglyph. The main difficulty for word count is that hieroglyphs are not separated with spaces. This means that Chinese sentence «这是鸟» (This is a bird – 3 words) is counted like a single word, in case the word count tool counts words basing on the spaces between words (there was even a related query on the WordPress support page).

But if you think that these 3 hieroglyphs «工业化» are also a separate sentence, then you are wrong, since this is just an “industrialization”. So the most logical method of text volume evaluation in Chinese is character count. E.g. a 1000 word English text translated into Chinese will be 1300-1800 characters long. You may read more about the English->Chinese word count ratio here.

Japanese. Japanese is written in a mixture of three main systems — hieroglyphs and two syllabaries: hiragana and katakana. This makes word count even more complicated than in Chinese. So a usual word count scheme in Japanese is based on characters without spaces, which seems quite logical.

Korean. Modern Korean is written with spaces between words (unlike of Chinese or Japanese). Traditionally, Korean was written in columns from top to bottom, right to left, but is now usually written in rows from left to right, top to bottom. This means that the traditional word count scheme, when a word is counted on a spacing basis can be applied.

Other. The only East Asian language except mentioned above that has no spaces is Thai, so the job estimate is done basing on the character count. The rest languages, including all the Indian languages (Bengali, Gujarati, Marathi, Urdu, Orya, Tamil etc), Indonesian, Farsi, Arabic, Turkish and Hebrew utilize spacing, which means that words can be easily counted with a word count tool.

To sum up. Languages that don’t have spacing and require character count include: Chinese, Japanese and Thai. The rest oriental language utilize spacing and enjoy word count instead of character count.

June 23, 2009

A Free Browser Word Count Add-in for Firefox

Filed under: tips and tricks — Tags: , — Thomas Vysokos @ 11:01 am

Have you ever needed to count quantity of the words on a web-page? Have you ever solved this task by copy/pasting the content into word processor and running statistic tool from there? And what if there is a free browser add-in capable of providing the statistics in the browser window?

Firefox boasts to be one of the most extensible browsers and even web humor proofs this. Today I’m reviewing a free word count Firefox add-in called Word Count Plus. It may be of a great benefit to you, so let’s get started.

Step 1. Install a Firefox browser.

For those who don’t have Firefox installed just download it here, and run the installation using default options (not a single problem even on Vista).

Step 2. Install Word Count Plus add-in.

Visit Word Count Plus webpage, then click Install version 1.3.0 button (the version may actually differ).

download word count plus

download word count plus

Firefox will prompt you to allow the add-in installation. Do so.

allow mozilla to install word count plus

allow mozilla to install word count plus

Click “Install now” to install the add-in.

start word count plus instalaltion

start word count plus instalaltion

Restart Firefox.

restart firefox after word count plus installation

restart firefox after word count plus installation

Step 3. Start counting.

You can either press a word count button

getting word count statistics in browser by pressing a button

getting word count statistics in browser by pressing a button

or right click it and get some shortcuts that make the word count much easier

word count plus shortcuts for faster work

word count plus shortcuts for faster work

Summary

Pros: 1) free; 2) flexible word count (you can count words in a first and the last paragraphs of the page with no copy/pasting); 3) supports addition and undoing the last action.

Contras: 1) a browser add-in (to count the words, you need to open a browser); 2) no bulk file processing (counting statistics in 10 files becomes a time-consuming task); 3) not full statistics (no count of alt tags, page title and keywords, as they are coded in fact).

A good tool for ad hoc use when you need to count quantity of the words or characters on the web page. Occasional word counters should thank Sam Waters, who built this fine app.

But professionals who need an accurate and full word count statistics in the html files, including the page title and alt tags text, should pay their attention to a professional word count software.

June 22, 2009

The History of Word Count Metrics

Filed under: more than just history — Tags: , — Thomas Vysokos @ 7:48 am

There is a number of jobs, where people are paid basing on how many text content do they produce, proofread, type or process in any other way. And there is a number of standards, basing on which people are paid. Anyone who had a need in word count came across several of them: 250 words, 300 words, 1800 signs or even 3500 signs. But why just not to pay on a per word basis?

Paying on a per word basis looks much simpler only from the first point of view. But every group of language is special and has its own word count traditions. Still size matters – some words are long, some words are short. So, years ago, two standard methods were developed to count words in a text. I call them Western and Soviet ones.

In Western method one word consists of six characters including spaces (average English word is 5.1 characters long). “Antiautomorphism” is 2 and 2/3 words long, which in fact equals a phrase “during the dinner”. This model is true, because it’s a bit unfair to count articles as separate meaningful words, which are usually twice as long as articles are.

Again Western word count method has 2 industry standards. In earlier times, when most manuscripts were prepared on typewriters with fixed pitch (monospace) fonts 250 words per page was generally considered to be standard, and many editors still use it. But in PC era an average manuscript page in 12 point Times Roman will contain about 23 lines of type per page and about 13 words per line, or 300 words per manuscript page.

In Soviet Union the main and dominant language was Russian. As you may know Russian has no articles, while an average Russian word is 6.36 characters long.

In the early 1920’s industry a new industry standard called “author’s list” was created. It consisted of unbelievable 40 000 signs (including spaces, number and all the punctuation). Unlike of Western standards in Soviet Union manuscripts were submitted with dual spacing, so an average typewritten page was 1800 characters long (paradox but that is 300 words in Western printing standard although average Russian word equals 1.24 English words). And if printed on a PC using 12 point Times Roman with single spacing an average page in Russian is 3500 signs big (584 Western words).

After the Soviet Union collapsed word count standards as well as a great deal of other standards were still widely used in the former republics. So if you are paid in units of 250 or 300 words, your client is most probably in Western Europe or America. But if your work is measured in 1800 or 3500 signs I bet that you got an order somewhere from Community of Independent States.

Still I have to explore the word count specifics in oriental languages. Soon an article on this topic will follow.

P.S. You can easily count word statistics almost in any document format using a word count software.

Powered by WordPress