One thing I always found fascinating about the statistics of opinion polls, is that margin of error has everything to do with sample size, and nothing to do with the size of the total population. This little counterintuitive formula means that a poll of 1600 randomly selected people has a margin of error of 2.5%, regardless of whether your conducting your poll for Peoria or for the entire United States.
This is almost correct, but not quite. You are referring to the distribution of the sample mean for an infinite population; under fairly mild conditions, this is asymptotically normal with a variance which depends on the sample size, but not on the size of the population (which is good, because the population is by assumption infinite!). For random samples from a finite population, however, the variance of the estimator is indeed affected by sample size. The intuition for this is straightforward: if I were to repeatedly take a census of the entire population (for some strange reason), the sample mean would equal the population mean every time. Thus, the variance of the sample mean would have to be 0, which is most certainly not what the infinite population model would predict.
For those who are interested, the standard deviation (sqrt(variance)) for a the sample proportion (i.e., the mean of a dichotomous variable) from a finite population is
sigma = sqrt(p(1-p)/n * (N-n)/(N-1))
where p is the true proportion, n is the sample size, and N is the size of the population. Three things should be noted. First, as you would expect, the (N-n)/(N-1) term goes to 1 as N goes to infinity....so, for large populations, you are correct that the population size is essentially irrelevant. Second, even when this factor does matter, its effect is to deflate the variance of the estimator -- this means that neglecting it can only make our assessments of the standard error more conservative, not less. Third, and finally, observe that the standard error here depends on the true proportion, p. Thus, the mean for a random sample of size 100 will have a different standard deviation if 5% of the population agrees with a poll item than if the agreement rate is 50%.
In practice, one usually ignores the finite population correction unless the sample size is roughly comparable to the total population (e.g., n/N>5% or so). (If it really affects your results otherwise, you're probably up to no good anyway! :-))
That was far more than anyone probably wanted to know about this, but here's my consolation prize: you can use the above to construct your very own confidence intervals! By the central limit theorem, the distribution of the sample mean (for a proportion) is asymptotically normal with mean given by the true mean and standard deviation given by the expression for sigma above. Thus, if p is the observed proportion for the population, p +/- 1.96*sigma is an (approximate) 95% confidence interval for the sample proportion. Great for impressing your friends and neighbors....
PS. Of course, if you actually wanted a 95% probability interval for an unknown proportion -- like a good Bayesian -- you'd want to use the 2.5% and 97.5% quantiles of a Beta(np+0.5,n(1-p)+0.5) distribution (assuming a Jefrey's prior). This is for the infinite population case.
[ Parent ]