Each Chinese ideogram counts as a word
You'd need to say what a word is. There are at least two or three potentially relevant notions here: morphological word, syntactic word, and lexeme. (Some people might argue that the first two are one and the same notion.) This is no easy matter; and it certainly can't be done on the basis of an orthography. English and German noun-noun compounds work essentially in the same way, but English orthography frequently insists on space between the components, while German orthography insists in no space.
does a Chinese speaker know as many words as an English speaker say?
This question is extremely vague, and attempts to make it precise show it to be pretty pointless. Let's take "word" to mean "lexeme" in your statement: the unit that characterizes a "family" of words with the same "basic meaning", e.g. we take "dog" and "dog", "run" and "runs" to be the same lexeme (though these are easy cases; this procedure in general is far from trivial). Now the task might be defined as counting how many underived lexemes the speaker of each language masters (lexemes that do not result from the application of a rule that forms complex lexemes; e.g. the verb "wash" is basic, the noun "washer" is derived). Each of these we call a listed unit; their crucial feature is that they are associations of sound and meaning that are not predictable from anything else in the grammar of the language (in the way the meaning of "washer" can be predicted from that of "wash").
This would be one way of making part of your proposal precise, but it still is problematic. The concept of a listed unit doesn't match the notion of a lexeme, given that plenty of listed units are complex phrases with meanings unpredictable from their parts (e.g. the classic example "kick the bucket", meaning "to die"). If you want to get at the minimal number of things that the speaker of a language needs to memorize, it will be a lot more involved than simply couting "words".
And I don't believe in this classic notion of listed unit anyway, and could spend *hours* criticising it. To make matters short, the idea that a listed unit has to be unpredictable from others is, in my mind, psychologically wrong, which makes the whole enteprise pointless.
Here are some statistics: the Oxford English Dictionary contains some 290,000 entries with some 616,500 word forms. Shakespeare used anywhere from 16,000 to 30,000 words in his work. An educated English speaker knows about 20,000 words, but uses about 2,000 in a week's conversation. Chinese has 120,000+ words, but a typical newspaper may have from 2,000 to 4,000 only.
And these statistics are pointless. They tell you nothing about Shakespeare for example, because "occurring between spaces" is dubious to be an interesting linguistic notion. Such an approach will undercount the linguistic units the speaker knows, and more severely for a language like Chinese with very little morphology (and thus a higher dependence on multi-word units).
And, in addition, different dictionaries are produced with different criteria, and you simply can't compare the number of words in them straightforwardly (if at all), even for dictionaries of the same language in the same historical period, designed for the same purpose.
"How many words a speaker of language X knows" is a very vague question, that is conceivably refinable to something to "what is the extent of the linguistic knowledge of a given speaker that is explicitly stored as fixed units in the brain, as opposed to constructed online". "How many words does language X has" on the other hand is just hopeless and pointless; it involves deciding what "a language" is, as opposed to what is the linguistic knowledge of some speaker.
I'm rambling by now, but the lesson is that these things are arcane and far from common sense.
[ Parent ]