Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: St: Dropping variables with mostly missing values
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: St: Dropping variables with mostly missing values
Date
Sat, 8 Feb 2014 01:08:01 +0000
However, -nmissing- and -npresent- (also SJ) allow you to go something like
npresent, min(20)
or even something like
npresent, min(`=ceil(_N/5)')
after which
keep `r(varlist)'
would -keep- what you wanted. Alternatively, use -nmissing- to count
missings and -drop- unwanted variables afterwards. The help indicates
that
"A question by Eric Uslaner led to the addition of r(varlist) as a
saved result."
and a search flags http://www.stata.com/statalist/archive/2005-02/msg00297.html
so this question comes nicely full circle. Thanks again, Eric, for
provoking that addition.
Nick
[email protected]
On 8 February 2014 00:00, Nick Cox <[email protected]> wrote:
> Good solutions to this came from Jeph Herrin, Amirsa and Richard Goldstein.
>
> Meanwhile, anyone interested in -dropmiss-, which doesn't do this,
> should please note that it comes from the Stata Journal, not SSC.
On 7 February 2014 20:40, Jeph Herrin <[email protected]> wrote:
>> To drop all variables missing more than 80% of the time:
>>
>> foreach V of varlist _all {
>> count if !mi(`V')
>> drop if r(N)/_N < 0.2
>> }
>>
>>
>> This works for string and numeric variables. Change 0.2 to whatever level
>> you want.
On 2/7/2014 3:11 PM, Eric M. Uslaner wrote:
>>> I know that this has been discussed before, but a long search doesn't find
>>> a solution for me (my own fault in searching, most likely).
>>>
>>> I have a data set (not my own) with 161 cases over a long time period.
>>> But most of the variables are largely made up of missing values
>>> (information wasn't available a long time ago). I have used Nick Cox's
>>> dropmiss (from SSC) to drop variables with all missing values. But a large
>>> number of variables remain with few observations. I would like to delete
>>> any variable with fewer than 20 cases. But I can't figure out how to do
>>> this (especially since I have a large number of variables, most of which
>>> have very few cases). Any help would be appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/