Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: gsort issue


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: gsort issue
Date   Thu, 5 Jul 2007 20:27:12 +0100

I don't use -gsort- much, as I usually prefer to work out 
my own -sort- order without wanting to re-discover
the precise idiosyncratic syntax of -gsort-. (I've 
got a blind spot on -recode- for the same kind of reason.) 

(That's not on a par with B*ll G???d, who 
can write the equivalent of an -egen- function 
several times faster than it takes to find out 
whether that function already exists.) 

But -- to the point -- while what Brian says is a fair 
answer it seems to me to point to a missing option on -gsort-. 

-reallydowantmissingfirst- would not be very Stataish 
as a name, but Fred Wolfe's want and need seemed very reasonable 
to me. 

Nick 
[email protected] 

Brian P. Poi
 
> On Thu Jul  5 06:58:30 2007, Fred Wolfe wrote:
> 
> > Is there a problem with gsort (Stata 10 and below) or am I 
> > misunderstanding something?
> >
> > I have a variable called -phdif-. I want the greatest value of that 
> > variable to appear in the last observation. There are 
> missing values, 
> > so I use -gsort- with the -mfirst- option.
> 
> ...
> 
> > . gsort phdif
> > . l phdif in 1,clean
> >
> >        phdif
> >   1.       1
> >
> > . l phdif in l,clean
> >
> >           phdif
> > 169914.       .
> >
> > The problem appears to be that missings are still last even 
> though I 
> > used the -mfirst- option.
> >
> > Any suggestions? Is this a problem or am I thinking about this 
> > incorrectly?
> 
> 
> The "mfirst" option of -gsort- applies only to variables sorted in 
> descending order.
> 
> Stata stores missing values as extremely large numbers, so if 
> a variable 
> is sorted in descending order, missing values should appear 
> first in the 
> list since they are greater than all non-missing values.
> 
> -gsort-, however, tries to be helpful when sorting in 
> descending order by 
> putting the missing values at the end of the list, assuming 
> that the user 
> really cares about the large real values of the variable, not 
> the missing 
> values.
> 
> The "mfirst" option tells -gsort- to put the missing values 
> first in the 
> list instead of trying to be helpful by putting them at the 
> end of the 
> list.
> 
> If you want to get the missing values to appear first when doing an 
> ascending sort, one way to proceed is to create a 0/1 
> variable equal to 0 
> if the variable of interest contains missing and 1 otherwise 
> and then sort 
> by the indicator variable and the variable of interest:
> 
>     . sysuse auto
>     . generate missrep78 = cond(missing(rep78), 0, 1)
>     . gsort missrep78 rep78
>     . list rep78 in 1/7, sep(0)
>          +-------+
>          | rep78 |
>          |-------|
>       1. |     . |
>       2. |     . |
>       3. |     . |
>       4. |     . |
>       5. |     . |
>       6. |     1 |
>       7. |     1 |
>          +-------+

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index