Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: using -drop if- with weights
From
Maarten buis <[email protected]>
To
[email protected]
Subject
Re: st: using -drop if- with weights
Date
Mon, 6 Sep 2010 09:15:55 +0000 (GMT)
--- Luis Armando Galvis writes:
> I have a question I am stuck with. I need to drop
> observations that are beyond 3 standard errors from the mean
> of one of the variables. The problem is that using -drop if-
> will eliminate observations without taking into account the
> weights and will eliminate more observations than needed. I
> cannot expand the dataset to 8 million records because of
> memory issues. My question is if there is a way to do this
> procedure in a more manageable way.
The command -drop- doesn't know weights, or allows for weights.
It doesn't know the mean or standard deviation either, so the
problem is not with -drop- but with what you typed before.
Since you did not tell us what you typed before, it is hard for
us to comment. Also you did not tell us why you think that your
command drops too many observations. This can be crucial
information, as the rules of thumb about how many observations
should be dropped with such a rule are often based on the normal
distribution, but if your variable is severly skewed or has a
spike than all bets are off when it comes to predicting how many
observations will be dropped with such a rule.
On a more fundamental note: such automatic deletion of observation
is almost always very very very wrong. Almost always it is the
exceptions that contain the most information, so we do not want
to throw them away. Think about it from a policy point of view, it
is usually the exceptions that we want to attain or prevent: We
want the population to live long and healthy and be richt, and want
to prevent early deaths, illness, and poverty. It is the extremes
that contain information on these events, not the "normal"
observations.
However, technically this is how you can do it:
sum var [fw=w]
drop if var < r(mean) - 3*r(sd) | var > r(mean) + 3*r(sd)
(assuming that your variables is called var and your weight
is called w)
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/