Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Martin Weiss" <martin.weiss1@gmx.de> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: AW: RE: AW: RE: RE: Delete missing |
Date | Sun, 9 May 2010 20:12:01 +0200 |
<> " For one, it is easy to imagine instances in which that strategy automatically leads to many more indicators (dummies, if you will) than directly available predictors" Just to be sure, I advocated creating _one_ indicator that you can condition on via -if-, instead of -drop-ping. How can this lead to "many" ... indicators? HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox Gesendet: Sonntag, 9. Mai 2010 20:02 An: statalist@hsphsun2.harvard.edu Betreff: st: RE: AW: RE: RE: Delete missing I think that's highly contentious. For one, it is easy to imagine instances in which that strategy automatically leads to many more indicators (dummies, if you will) than directly available predictors and to a kind of model that would not strike anybody in its target audience as scientifically interesting or useful. Besides, modelling with predictor variables on the RHS is far from the only kind of statistical analysis possible. Nick n.j.cox@durham.ac.uk Martin Weiss Overall, -drop-ping is an inferior strategy to using a dummy for inclusion in the analysis: http://www.stata.com/statalist/archive/2009-12/msg00511.html The only reason not to go for the latter strategy is the fear that the -if- qualifier will be forgotten at some stage - which cannot happen after the -drop- command... Nick Cox Tony's comment seems a bit more severe than the facts warrant. If you have missings Stata will just ignore them, so -drop-ping them from the dataset is not going to make much difference to that. My impression as its author is that many of the uses of -dropmiss- (SJ) in particular and many of the reasons for this request arise from innocuous missings. For example, spreadsheet people often leave blank rows and/or columns in their worksheets just as ways of making their data more readable. Import into Stata will usually take such rows and columns literally but they have no content and are best -drop-ped straight away. There is no statistical issue in those situations raised by -drop-ping missings, as the missings do not correspond to potential data even in principle. Where it gets more complicated is that some people are tempted to -drop- variables and/or observations in which _any_ values are missing. That's usually going to lead to loss of information. That may be Tony's main point. Nick n.j.cox@durham.ac.uk Lachenbruch, Peter Generally, this is a very bad idea. You will get biased estimates of any parameters you estimate unless the data is missing at random. Check the multiple imputation manual. Also note that Stata is not capitalized as you have done. Patricia Yu [pyu1@wisc.edu] I have a question about deleting missing data. I would like to delete cases if they have missing values in any variables in my dataset. How can I do in STATA to delete these cases with any missing data? Could you please share STATA codes with me? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/