First, my research methodology.
I went to Nua Internet
How Many Online? to find out what countries the people on-line were from.
The top 12 were USA (148 million), Japan (27 million), UK (19.5 million),
Germany (18 million), China (16.9 million), South Korea (15.3 million),
Canada (13.2 million), Italy (11.6 million), Russia (9.2 million), France (9 million),
Brazil (8.6 million), Australia (7.8 million).
Already I was surprised - I didn't expect quite so many from Asian countries with difficult
Next, I went to Ethnologue to see what languages were spoken in those
As this is an amazingly complex question, and the Ethnologue data is not easily downloadable
into a spreadsheet, I had to make some gross generalisations and approximations, e.g. all
UnitedStatesOfAmericans read the Internet in English.
I was also distressingly ignorant about many countries.
For example, Ethnologue lists the primary languages of Italy as being Italian (55%),
Lombard (15%), Neapolitan (12%), Sicilian (8%), and so on.
I don't know whether these are only spoken languages, or have widely used written forms as
For all I know, Sicilians are taught to read and write Italian at school, and most written
communication in Italy is in Italian.
Nevertheless, I soldiered on.
I did at least some extra research, and discovered that although there are 7 main languages
used in China (Mandarin, Cantonese, Wu, Xiang, Jinyu, Min Nan), the written form was only
invented this (20th) century, and is based on Mandarin with Wu influence.
So, I can presume that Chinese internet readers read the web in Mandarin.
Similarly, there are many Arabic dialects, but the written form is pretty much the same as
in the Koran.
So I presumed Arabic internet users all use the same form of Arabic.
Other countries which were particularly difficult to categorise were South Africa (22% Zulu,
15% Afrikaans, 18% Xhosa), where I still suspect most internet usage is amongst the
English and Afrikaans speakers; India (no more than 10% anything); and countries with small
internet populations like Nigeria, Morocco, United Arab Emirates, Iran, Thailand and the
Philippines, where I had never even heard of many of the languages.
Finally, I combined the data on internet population and spoken languages, to form the
combined approximate numbers of internet users by language.
No doubt the numbers are wrong, but I at least hope the order is right.
The numbers of readers of various languages on the internet are (in thousands):
English (184053), Japanese (27000), Mandarin (22954), German (20600), Korean (15300),
French (13176) Russian (9524), Portuguese (9300), Spanish (8893), Dutch (7428),
Italian (6452), Swedish (4185), Polish (2804), Cantonese (2733), Finnish (2425),
Danish (2300), Norwegian (2200), Lombard (1860), Turkish (1818), Swiss German (1536),
Neapolitan (1392), Greek (1300), Arabic (1285).
After that, the numbers become even more dubious.
I find the prominence of Korean most surprising.
I expected a lot of European languages, but the Asian languages are right up there.
So what does this mean to you?
Well, not much, if you are a reader of the Internet; but it's probably helpful if you are
writing for an internet audience.
The use that particularly interests me is internationalisation of open-source software.
If I am to maximise the circulation of my software, of course I will write it in English,
but then I will try to get it translated into as many of these languages as possible.
With the prominence of Asian languages in my audience, that's an interesting problem.
I hope this article has made you think about the Internet as a global phenomenon rather
than as an American thing.
Given the huge number of Chinese readers on the Internet already, it could be the case that
in a very few years, the predominant language on the Internet is Mandarin.
That's something to think about, and potentially something to make a career of.
Please post comments telling me of any errors in my analysis.