Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | David Bai <db555@mail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: St: Panel data imputation |
Date | Tue, 21 Sep 2010 09:26:30 -0400 |
-----Original Message----- From: Nick Cox <n.j.cox@durham.ac.uk> To: 'statalist@hsphsun2.harvard.edu' <statalist@hsphsun2.harvard.edu> Sent: Tue, Sep 21, 2010 6:53 am Subject: st: RE: St: Panel data imputationThe straight answer to this question is that -- as the help for -ipolate- makes clear -- there is an -epolate- option which you can use at your peril to fill in values at the ends of your series. This will work with panel data too, in the
sense that you will get what you ask for. Note that -ipolate- is a command, not a function.On the larger issue, raised by Maarten Buis, I hope we could all agree that interpolation, which has a centuries-old history, is not quite a kind of imputation, which is currently so fashionable as a species of statistical white
magic. (Naturally, your definition of imputation might be so wide thatinterpolation is a special case; I would want to suggest that such a wide
definition will only lead to misunderstanding.) I can see various advantages and disadvantages: 1. Interpolation is usually relatively simple to define. The linear interpolation offered by -ipolate- certainly qualifies. 2. Interpolation is in various senses unstatistical, asa. it takes account of at most local structure and works with data one response
variable at a time.b. it typically reduces variability, which distorts statistical analysis to an
unknown extent c. it is deterministic so is not accompanied by any estimate of error.Clearly, this isn't a complete characterisation. Also it simplifies some larger
issues.I am at an extreme position within this list, as I have never used imputation, but I have often used interpolation for gappy time series or spatial series with no covariates. Such work has had as side-effects programs -cipolate- and
-csipolate- on SSC. If you are using interpolation I have some hackneyed pieces of advice:* Get a feeling of how interpolation treats data like yours by artificially introducing gaps in good quality data and seeing how successful interpolation is
at reproducing known values.* Try different kinds of interpolation to get a sense of how far they agree.
* Go very easy on the extrapolation. This commentary steals one cogent remark made by Patrick Royston in a conversation at the recent London users' meeting. Nick n.j.cox@durham.ac.uk Maarten Buis ============-ipolate- is generally not a good imputation method. Look at -help mi- and
-findit ice- instead. David Bai ========= I have a panel data (year and revenue) and would like to use ipolate function to impute the missing values for some years. What kind of data will not be imputed if I use this method? It looks like that, when I have missing values for the beginning year or the end of the year, this method will not impute the missing values in these years. Is there a way to deal with this problem? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/