Tomas:
If you have the entire population than significance levels are meaningless. It assumes that you are uncertain whether your estimates are equal to the population values due to sampling variability; since you have no sample, you are no longer uncertain about the population value. Whatever parameter you estimate is the parameter that occurs in the population. There are other sources of uncertainty, e.g. you are obviously uncertain about the proper model, but also the variables could be (and probably are) measured with error. However, frequentist inference (what you do if you look at "p-values") does not take this uncertainty into account. In other words, it hopes that this uncertainty is swamped by the uncertainty due to sampling variation, which is obviously not the case for you. In essence you have two options: 1) take a frequentist stance and claim you are certain, i.e. choose a model solely based on theory and just report the parameters without significance level, standard e
rror, confidence interval, or 2) go Bayesian. Good and accessible places to start learning about Bayesian stats are Bolstad 2004 and Lancaster 2004. Unfortunately, you will probably have to use another stats package if you go Bayesian. R (http://www.r-project.org/) and WinBugs/OpenBugs (http://mathstat.helsinki.fi/openbugs/) are particularly popular among Bayesians.
HTH,
Maarten
William M. Bolstad (2004), "Introduction to Bayesian Statistics", Hoboken, NJ: Wiley
Tony Lancaster (2004), "An Introduction to Modern Bayesian Econometrics", Malden, MA: Blackwell Publishing
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
At dinsdag 20 december 2005 11:03 Thomas wrote:
> <snip> The problem is that if I add each new variable
> (or each new interaction between two variables) in
> model, it always significantly contributes to response
> variable and the fit of each complex model is always
> better than the previous (more parsimonious) one.
> (BIC is always lower, LR is always higer and D is
> alway lower). I think that the problem is in large N.
> My data come from the whole population. <snip>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/