Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Issues with missing values


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Issues with missing values
Date   Mon, 10 Mar 2014 15:38:16 +0000

Thanks for this.

You clarified that -cal_in- is your response variable and that you
have missing values on the predictors too.

Whatever you do is wrong here from some point of view of view but I'd bet that

* using the data as they come, so leaving out missings

would get more votes than

+ using the data as they come and replacing missings by means

but regardless you do have scope for doing both and seeing how much
difference it makes.


Nick
[email protected]


On 10 March 2014 15:27, Halua Koko <[email protected]> wrote:
> Hi Nick,
> Thanks for the response. Sorry didn't mention it before, my y=calorie
> intake (cal_in). It's a continuous variable. I really didn't want to
> go into the messy multiple imputation techniques, so I tried the
> linear prediction technique, ie:
> reg y x1 x2..
> predict y'
> But I guess due to missing values in x1, x2, this isn't working. I've
> been trying to figure out other work-arounds, but unsuccessfully. At
> the moment, I have about 20% of the 5000 obs missing, would you
> suggest going ahead without them? Would you have any other ways of
> solving this particularly perturbing issue? Indeed I'll refer to it as
> a wide "structure" from now on!
> Thanks again
> Halua
>
> On Mon, Mar 10, 2014 at 3:59 PM, Nick Cox <[email protected]> wrote:
>> The main issue here is what you are trying to do.
>>
>> 1. It might seem reasonable for your purposes to replace missings with
>> the mean. Even though you might be unable or unwilling to apply
>> imputation, some kind of interpolation (in time) is, however, a
>> possible alternative.
>>
>> 2. But the missings replaced with means don't carry new information
>> about the distribution. Classifying into quantile-based  groups is
>> spurious unless you use only the non-missings to determine quantiles.
>> Unfortunately, it is also likely to be spurious applying that to the
>> extra means too. -xtile- does the best it can, but necessarily often
>> produces bizarre results because of its rule that identical values
>> must be placed in the same group.
>>
>> 3. I don't understand the fudge you are imagining, but it sounds quite
>> arbitrary and difficult to defend.
>>
>> 4. I didn't catch why you think you you need to classify these values
>> any way. I don't know what -cal_in- is, but using the panel means (or
>> medians) of what you have seems a more defensible way to make use of
>> what information there is. That, however, may miss the point if you
>> want to catch impacts during the time panels were observed.
>>
>> 5. Panel data are almost always better off in a long shape or
>> structure (my self-imposed Sisyphean task is to persuade people not to
>> say "format" given its existing use in Stata).
>>
>>
>> Nick
>> [email protected]
>>
>>
>> On 10 March 2014 14:31, Halua Koko <[email protected]> wrote:
>>
>>> I've been working with a panel dataset and while putting it together
>>> have replaced a number of missing values in variable cal_in with the
>>> mean for each of the years. But when trying to create quintiles of the
>>> baseline values to assess heterogeneity of impact (using xtile
>>> Q=cal_in, nq(5)), I noticed that doing so had clumped together about
>>> 1000obs around one value, ie, the mean. So in essence my xtile groups
>>> are distributed unevenly and the 4th quantile seems to be entirely
>>> missing. FYI my panel is in the wide format.
>>> Can anyone suggest a solution to this problem? I was thinking of
>>> redistributing the clumped values by small increments so as to have
>>> the same mean, but differing values, but not sure how to do this.
>>> Can anyone help me figure this out?
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index