Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Martin Weiss" <martin.weiss1@gmx.de> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: AW: st: RE: AW: RE: AW: Regressing and storing residuals in one line. |
Date | Mon, 28 Jun 2010 20:30:16 +0200 |
<> NJC`s solution is much better than mine. Mine leaves behind residuals for all observations in your dataset even though they never entered the estimation sample. Their meaning is hence dubious. So stick to Nick`s code, I would say... HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Dani Tilley Sent: Montag, 28. Juni 2010 19:51 To: statalist@hsphsun2.harvard.edu Subject: Re: AW: st: RE: AW: RE: AW: Regressing and storing residuals in one line. Thanks for your response. I also think they should match and the # on the obs should be the the number of the observations used in the regression, not the total observations. The MW snippet is missing a condition in the "predict res`lev', res" line. If you compare the residuals from these two, you'll notice the discrepancy. ///MW sysuse auto, clear qui levelsof rep78 foreach lev in `r(levels)'{ qui regress price weight length if rep78==`lev' predict res`lev', res } ///NJC sysuse auto, clear qui levelsof rep78 gen residual = . foreach lev in `r(levels)'{ tempvar foo qui regress price weight length if rep78==`lev' predict `foo', res replace residual = `foo' if rep78 == `lev' drop `foo' } If, in MW, we say "predict res`lev' if rep78==`lev', res", the problem is fixed. This is all I meant. Thanks a lot to Martin and you for the help. Best, DF Tilley ----- Original Message ---- From: Nick Cox <n.j.cox@durham.ac.uk> To: statalist@hsphsun2.harvard.edu Sent: Mon, June 28, 2010 1:35:54 PM Subject: RE: AW: st: RE: AW: RE: AW: Regressing and storing residuals in one line. I can answer one of these questions; otherwise I am not clear what you are puzzled about as I can't see any problem with the code suggested. The number of observations for the composite residuals variable should be the sum of the numbers of observations included in the separate regressions. If any observation was excluded from a regression, the corresponding residual should be missing. That would be a consequence of your data, which we can't see. Minima and maxima should match, as I understand it. Nick n.j.cox@durham.ac.uk Dani Tilley Sorry, I completely missed that. I also tried a loop structurally similar to the one you suggested, but noticed the summarize res* output is different from the summarize residuals output from NJC's suggestion. I understand that your loop stores the residuals in separate variables (one for each category), while NJC creates an empty variable and populates it on the fly, but shouldn't say the minimum or maximum residuals from the two outputs match? Shouldn't the smallest value from the min column of summarize res* (MW) output be the same as the Min from summarize residuals (NJC)? In addition shouldn't the sum of the obs column from the summarize res* (MW) output be _N? I'm very new to Stata, so I don't really know if this makes sense at all but I think this is the correct way to get the residuals using the loop you suggested: predict res`lev' if country == `lev', res From: Martin Weiss <martin.weiss1@gmx.de> Having -drop-ped it, you cannot access it anymore. But NJC`s strategy is that the results you are interested in are gathered inside the permanent "residual" variable, so this is not a drawback. Dani Tilley If I define a tempvar and drop it at the end of the loop, can I still refer to it elsewhere in the program (i.e. outside the loop)? From: Nick Cox <n.j.cox@durham.ac.uk> If you are doing this lots of times for real, you could end up with storage problems with dozens of temporary variables. If that doesn't bite, then OK. Martin Weiss The ************* drop `foo' ************* line could be safely omitted, btw. Stata just makes up new tempnames, and discards them all at the conclusion of the do-file. Nick Cox Such residuals have rather poorly defined properties, but let's set that on one side. A single variable can be obtained through a minor variation on Martin's recipe: sysuse auto, clear qui levelsof rep78 gen residual = . foreach lev in `r(levels)'{ tempvar foo qui regress price weight length if rep78==`lev' predict `foo', res replace residual = `foo' if rep78 == `lev' drop `foo' } Nick n.j.cox@durham.ac.uk Martin Weiss Just loop through the thing: ************* sysuse auto, clear qui levelsof rep78 foreach lev in `r(levels)'{ qui regress price weight length if rep78==`lev' predict res`lev', res } ************* Dani Tilley I'm trying to run several regressions (one for each level of a categorical variable) and store the residuals from each regression in a local macro or new variable I could later manipulate. I figured I could use: bysort category: regress y x1 x2 to run the regressions, but I need a second line of code (predict name, residuals) to get the residuals when bysort allows only one. Is there a way around this? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/