Nick,
you are right, -sort- sorts missing _numeric_ values to the end.
Still, from what I observe in case of a string variable - sort - sorts
missings, i.e. empty strings, to the top, which certainly makes sence.
However, if "cost" was a string variable the command you have
presented will not work as wanted..
Kind regards,
sebastian
On 8/10/05, Nick Cox <[email protected]> wrote:
> The -sort- sorts missing values to the end
> of each panel. So afterwards if any values in the panel
> are missing, then the last one will be too. That
> is necessary and sufficient information for a -drop-.
>
> The -drop- then drops all observations in the panel
> if (iff) the last one is missing.
>
> Nick
> [email protected]
>
> Christian Holz
>
> > I think, however, that Nick's approach does not work, if a value for
> > year 5 is there and another year has a missing value, as
> > Nick's command
> > only checks the last observation of each ID group.
> > I might be wrong, but in case I am not, it's worth mentionning...
>
> Nick Cox wrote:
> > Another way of doing this, without any new
> > > variables:
> > >
> > > bysort ID (Cost) : drop if missing(Cost[_N])
> > >
> > > Nick
> > > [email protected]
> > >
> > > Antoine Terracol
> > >
> > >
> > >>I would try something like :
> > >>
> > >>generate tag=(cost==.)
> > >>egen toberemoved=sum(tag), by(ID)
> > >>drop if toberemoved>0
> > >>drop tag toberemoved
> > >>
> > >>
> > >>You will need to replace the "cost==." in the fisrt line by a more
> > >>general way to tag your erroneous values (such as "cost==. |
> > >>cost>9999")
> > >
> > >
> > > Murray Lowe
> > >
> > >
> > >>>I am working with a large dataset and have discovered that
> > >>
> > >>some of the data
> > >>
> > >>>are missing values or have erroneous values. The data is
> > >>
> > >>panel data with
> > >>
> > >>>observations per individual over a 5 year period. For example:
> > >>>
> > >>>ID Year Cost
> > >>>
> > >>>1 1 100
> > >>>1 2 200
> > >>>1 3 500
> > >>>1 4 150
> > >>>1 5 x
> > >>>2 1 100
> > >>>2 2 200
> > >>>2 3 500
> > >>>2 4 600
> > >>>2 5 100
> > >>>
> > >>>The problem is this: If an individual has a missing /
> > >>
> > >>erroneous value for a
> > >>
> > >>>particular year, I want to exclude ALL of their
> > >>
> > >>observations from the
> > >>
> > >>>dataset. In the example patient 1 would be removed from the dataset
> > >>>entirely. How can this be done through an automated-type process?
> > >>>Essentially I need a code / method that looks for the
> > >>
> > >>anomalous data;
> > >>
> > >>>identifies the patient and then removes all of their
> > >>
> > >>observations from the dataset.
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
- Seb F Buechte
-
- Stay tuned!
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/