Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Multiple imputation with panel data
From
Veronica Galassi <[email protected]>
To
<[email protected]>
Subject
Re: st: Multiple imputation with panel data
Date
Fri, 06 Jul 2012 10:33:48 +0100
Dear all,
You cannot even imagine how much I appreciate your advice!!! This is my
very first quantitative research using Stata and my dataset is not exactly
how a researcher would expect it to be.
The example reported by Lance describes perfectly my situation.
But apart from this variable which has got missing information for one
entire year, I have got also many other data missing at random for other
variables.
So maybe I should first try to estimate the coefficients of x2003 = b_0 +
b_1*y2003 and then use them to predict x2007 as Oliver was suggesting.
But isn't this way of proceeding the same than extrapolating values for
x2007 exploiting the linear relationship between x and y?
Because maybe I could simply use extrapolation.
I have also read that in order to perform something which is closer to
what Stata does when performing multiple imputation, I could compute the
variance of the residuals obtained from the first regression and then
predict x2007. Randomly drawing m numbers I could multiply each of these m
numbers by the standard deviation of the residuals and then adding this
value up to the predicted values of x2007 I would be able to obtain m
imputations from my original dataset. Using Rubin's rule I would then
obtain one single value from my m imputations. Do you think this makes
sense?
Once I have done that, I should try again to perform multiple imputation
in Stata to impute the rest of the dataset following what Wes was
suggesting.
Cheers,
Veronica
So On Fri, 06 Jul 2012 01:42:50 +0200, Oliver Jones
<[email protected]> wrote:
> Hi Veronica,
>
> if the little data example Lance gave is describing your situation, then
I
> agree with his
> conclusion that you can not impute the missing values.
>
> To be precise there is a way to get reasonable values for x2007 but the
> result will not help
> in explaining y2007! The way I'm talking about is to estimate x2003 =
b_0
> + b_1*y2003 then
> assume that the parameters didn't change over time and calculate b_0 +
> b_1*y2007 which is your
> estimate for x2007...
>
> But as Lence said, others might come up with something more helpful...
>
> Best Oliver
>
> Am 06.07.2012 00:42, schrieb Lance Erickson:
>> Veronica,
>>
>> Perhaps I'm misunderstanding your problem, but if you have wide format
>> data and there are no values for any of the observations in 2007 for
one
>> of the variables in the imputation model, with data like...
>>
>> Id x2003 x2007 y2003 y2007
>> 1 5 . 8 9
>> 2 4 . 3 3
>> 3 3 . 8 5
>>
>> then I don't think that multiple imputation is an option for you. My
>> understanding of MI is more intuitive than technical but I believe that
>> to impute values for a given variable, there has to be some information
>> about how the variable is distributed. But if, in the example above,
>> x2007 is all missing then there is no existing information that can
>> inform the estimation of missing values. In other words, MI can't
create
>> data that you don't have. (Even though I think people sometimes seem to
>> prefer listwise deletion to MI because it feels like that's exactly
what
>> MI is doing.) It can only give you estimates of what the data might be
>> based on existing values and their relationship of those existing
values
>> to other variables in the imputation model. There are many others on
>> Statalist that are substantially better credentialed than I to answer
>> your question but that's my take.
>>
>> Best,
>> Lance
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Veronica
>> Galassi
>> Sent: Thursday, July 05, 2012 1:00 PM
>> To: [email protected]
>> Subject: Re: st: Multiple imputation with panel data
>>
>> Hi Oliver,
>>
>> Thank you for your kind reply!
>>
>> I am not quite sure whether I got your hint or not...maybe my
>> explanation was just not clear enough, sorry about that!!!
>> I think my case is slightly different from what you were describing
>> because I am not interested in the missing data between 2003 and 2007.
>> In that case, as you said, I would just fit a line.
>> What I am trying to impute are the missing data inside the year 2003
and
>> 2007 respectively.
>> And things are made even more complicated by the fact that for the main
>> explanatory variable of my model I have got only observations for the
>> year
>> 2003 but not for 2007. That's why I was thinking about multiple
>> imputation!
>> But maybe you are right, I just have too many missing data.
>>
>> Best,
>>
>> Veronica
>>
>>
>>
>> On Thu, 05 Jul 2012 19:27:47 +0200, Oliver
>> Jones<[email protected]> wrote:
>>> Hi Veronica,
>>> I have just one hint: Maybe two observations are just not enough to do
>> the
>>> imputation.
>>> Just think about it, I give a number, e.g. 3.145 percent, for 2003 and
>>> a number, e.g 5.0 percent, for 2007 and ask you what are the values
>>> for the years in
>> between.
>>> Can you imagine some fancy method two figure it out?
>>> I would suspect, under the assumption you don't have any other
>>> information, that there is no best solution. Maybe you could just draw
>>> a line between the years.
>>>
>>> Best
>>> Oliver
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> --
>> VERONICA GALASSI
>> MSc Development Economics
>> University of Sussex
>> Mobile: +44 78 5563 0276
>>
>> 14 Auckland Drive,
>> BN2 4JS, Brighton, UK
>>
>> E-mail: [email protected]
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
--
VERONICA GALASSI
MSc Development Economics
University of Sussex
Mobile: +44 78 5563 0276
14 Auckland Drive,
BN2 4JS, Brighton, UK
E-mail: [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/