Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: St: Panel data imputation

From	David Bai <[email protected]>
To	[email protected]
Subject	Re: st: RE: St: Panel data imputation
Date	Tue, 21 Sep 2010 09:26:30 -0400

Thank you, Nick and Maarten, for the very detailed response. Veryhelpful. Given the limitations of this command, it looks like thatmultiple imputation would be the best approach to dealing with themissing values. Am I understanding it correctly?



-----Original Message-----
From: Nick Cox <[email protected]>
To: '[email protected]' <[email protected]>
Sent: Tue, Sep 21, 2010 6:53 am
Subject: st: RE: St: Panel data imputation

The straight answer to this question is that -- as the help for-ipolate- makesclear -- there is an -epolate- option which you can use at your perilto fill invalues at the ends of your series. This will work with panel data too,in the

sense that you will get what you ask for.

Note that -ipolate- is a command, not a function.

On the larger issue, raised by Maarten Buis, I hope we could all agreethatinterpolation, which has a centuries-old history, is not quite a kindofimputation, which is currently so fashionable as a species ofstatistical white

magic. (Naturally, your definition of imputation might be so wide that

interpolation is a special case; I would want to suggest that such awide

definition will only lead to misunderstanding.)

I can see various advantages and disadvantages:

1. Interpolation is usually relatively simple to define. The linear
interpolation offered by -ipolate- certainly qualifies.

2. Interpolation is in various senses unstatistical, as

a. it takes account of at most local structure and works with data oneresponse

variable at a time.

b. it typically reduces variability, which distorts statisticalanalysis to an

unknown extent

c. it is deterministic so is not accompanied by any estimate of error.

Clearly, this isn't a complete characterisation. Also it simplifiessome larger

issues.

I am at an extreme position within this list, as I have never usedimputation,but I have often used interpolation for gappy time series or spatialseries withno covariates. Such work has had as side-effects programs -cipolate-and

-csipolate- on SSC.

If you are using interpolation I have some hackneyed pieces of advice:

* Get a feeling of how interpolation treats data like yours byartificiallyintroducing gaps in good quality data and seeing how successfulinterpolation is

at reproducing known values.

* Try different kinds of interpolation to get a sense of how far theyagree.


* Go very easy on the extrapolation.

This commentary steals one cogent remark made by Patrick Royston in a
conversation at the recent London users' meeting.

Nick
[email protected]

Maarten Buis
============

-ipolate- is generally not a good imputation method. Look at -help mi-and

-findit ice- instead.

David Bai
=========

I have a panel data (year and revenue) and would like to use
ipolate function to impute the missing values for some years. What kind
of data will not be imputed if I use this method? It looks like that,
when I have missing values for the beginning year or the end of the
year, this method will not impute the missing values in these years. Is
there a way to deal with this problem?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: St: Panel data imputation
  - From: Nick Cox <[email protected]>

References:
- st: St: Panel data imputation
  - From: David Bai <[email protected]>
- st: RE: St: Panel data imputation
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Re: xtunitroot ips error for 'no observations' and trend
Next by Date: st: RE: FIRTHLOGIT with factor (categorical variables)
Previous by thread: st: RE: St: Panel data imputation
Next by thread: RE: st: RE: St: Panel data imputation
Index(es):
- Date
- Thread