Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp) |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: xtmepoisson out of sample prediction error |
Date | Wed, 09 Feb 2011 17:45:24 -0600 |
Jibonayan Raychaudhuri <jibonayanrc@yahoo.com> asks: > I have estimated a random intercept Poisson regression model using > xtmepoisson in Stata : > xtmepoisson y x1 x2 x3 exposure(expvar) || district:, irr > I now want to carry out an out-of-sample prediction for y using the > estimated parameters of the above model. > This is what I have done: > Step 1: Estimate the model > Step 2: predict b*,reffects > Step 3: preserve > Step 4: use newdata.dta,clear (new.dta has data on x1,x2 and x3 and the > exposure variable expvar only--this is the out of sample data) > Step 5: predict n (this is to predict mean count of y for newdata.dta) > However Stata gives me an error message which says "variable y not found > r(111)" > The reason why I have used xtmepoisson instead of xtpoisson,normal is > because I want predicted mean count to be based on both fixed and random > effects. This is easy for an in-smaple prediction. However as I > mentioned this is not working for an out-of-sample prediction. I know > that this works if I set random effect=0 but this is not what I want. > Can someone tell me why I am getting this error message? Any help would > be greatly appreciated. If you wish to make predictions after -xtmepoisson- that incorporate random effects then you need to keep the estimation data in memory. The estimated random effects are calculated from the estimation data, and this takes place both when you fit the model and when you make predictions. As such, rather than replace your estimation data with new prediction data, you want to append the estimation data with the prediction data. Your prediction data must contain values for all the covariates in your model AND a value for the group variable (-district- in your example) that is represented in the estimation data. Because your prediction will include a random effect, Stata needs to know which group's random effect to use and cannot infer one for a group not represented at estimation. Here is an example using the Bangladesh Fertility Survey, where I fit an -xtmepoisson- model then append one new observation to predict on: . webuse bangladesh, clear . gen children = child1 + 2*child2 + 3*child3 // No. of children . xtmepoisson children c_use age urban || district: There are 1,934 observations in these data and so I add one more . set obs 1935 I then set the covariates and district for this new observation . replace c_use = 0 in 1935 . replace age = 0 in 1935 // age is mean-centered . replace urban = 1 in 1935 Don't forget the group variable if you want to incorporate random effects . replace district = 40 in 1935 Now you can predict for both the estimation data and for the new observation . predict n Profit . list c_use age urban district n in 1935 --Bobby rgutierrez@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/