Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Issues with missing values


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Issues with missing values
Date   Mon, 10 Mar 2014 14:59:46 +0000

The main issue here is what you are trying to do.

1. It might seem reasonable for your purposes to replace missings with
the mean. Even though you might be unable or unwilling to apply
imputation, some kind of interpolation (in time) is, however, a
possible alternative.

2. But the missings replaced with means don't carry new information
about the distribution. Classifying into quantile-based  groups is
spurious unless you use only the non-missings to determine quantiles.
Unfortunately, it is also likely to be spurious applying that to the
extra means too. -xtile- does the best it can, but necessarily often
produces bizarre results because of its rule that identical values
must be placed in the same group.

3. I don't understand the fudge you are imagining, but it sounds quite
arbitrary and difficult to defend.

4. I didn't catch why you think you you need to classify these values
any way. I don't know what -cal_in- is, but using the panel means (or
medians) of what you have seems a more defensible way to make use of
what information there is. That, however, may miss the point if you
want to catch impacts during the time panels were observed.

5. Panel data are almost always better off in a long shape or
structure (my self-imposed Sisyphean task is to persuade people not to
say "format" given its existing use in Stata).


Nick
[email protected]


On 10 March 2014 14:31, Halua Koko <[email protected]> wrote:

> I've been working with a panel dataset and while putting it together
> have replaced a number of missing values in variable cal_in with the
> mean for each of the years. But when trying to create quintiles of the
> baseline values to assess heterogeneity of impact (using xtile
> Q=cal_in, nq(5)), I noticed that doing so had clumped together about
> 1000obs around one value, ie, the mean. So in essence my xtile groups
> are distributed unevenly and the 4th quantile seems to be entirely
> missing. FYI my panel is in the wide format.
> Can anyone suggest a solution to this problem? I was thinking of
> redistributing the clumped values by small increments so as to have
> the same mean, but differing values, but not sure how to do this.
> Can anyone help me figure this out?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index