Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: detecting a complete data set
From
Donald Spady <[email protected]>
To
[email protected]
Subject
Re: st: detecting a complete data set
Date
Tue, 16 Nov 2010 06:31:46 -0700
I guess it really helps to define the question explicitly, and I was too vague.
Here is a simple example
id level A B C D
1 1 1 1 1 1
1 2 1 0 . 0
1 3 . 1 0 .
1 4 1 0 1 1
2 1 1 0 1 1
2 2 0 0 0 1
2 3 0 1 1 1
2 4 1 1 1 1
and so on
I want to detect the ID that has no data missing for ALL Levels within that ID. In this case ID 2 fits the bill, ID 1 has data missing in levels 2 and 3
There are NO missing values for id or level. They are basically placeholders.
I know how to find missing for A B C D; what I don't know is how to detect those IDs where there is some missing (A B C D) data in at least 1 of the levels, OR, putting it the other way, I need to know the IDs where there are values for A B C D for every level.
Thanks
Don
On 2010-11-16, at 4:41 AM, Nick Cox wrote:
> I want just to add a few notes comparing these solutions and mentioning some others.
>
> 1. Mitch's solution contains typos, as != not !== indicates not equal to:
>
> count if A !=. & B !=. & C !=. & D !=.
>
> 2. Beyond that, Mitch's solution differs from Phil's, as it won't exclude extended numeric missing values .a to .z. It could be extended to exclude all numeric missings by changing != to < and to exclude string missings by adding conditions of the form E != "" for a string variable E.
>
> 3. Phil's solution of working with -missing(A,B,C,D)- and its negation -!missing(A, B, C, D)- is a good solution for a small or moderate number of variables. Beyond that, writing out a long comma-separated varlist is a little tedious and error-prone.
>
> 4. Moreover, -missing()- happily takes a mixture of numeric and string arguments.
>
> 5. For many variables you could use -egen-'s -rowmiss()- function which uses -missing()- internally to create a new variable counting missings in observations. The advantage of that it takes varlists, including variable ranges and wildcards. A value of 0 for the resulting variable means all present.
>
> 6. Some people have learned the trick of throwing a set of variables at -regress- which naturally will only accept complete observations on the variables specified. After the regression e(sample) tags observations that are all present.
>
> . regress A B C D
> . gen byte allpresent = e(sample)
>
> 7. Programmers might want to use commands specifically provided for this purpose. After something like
>
> . gen byte allpresent = 1
>
> either
>
> . markout allpresent <varlist>
>
> or
>
> . markout allpresent <varlist>, strok
>
> lets you tune your tagging.
>
> 8. This is undoubtedly not a complete list.
>
> Nick
> [email protected]
>
> Mitch Abdon
> ===========
>
> If you just need the number of observations with nonmissing ABC and D , try:
>
> count if A !==. & B !==. & C !==. & D !==.
>
> You can also generate a variable that will indicate if the observation
> has no missing values (1 if no missing values and 0 otherwise),
> example:
>
> gen nomissing=(A !==. & B !==. & C !==. & D !==. )
> tab nomissing
>
> Phil Clayton
> =============
>
> If I'm understanding your question properly, you simply want to know the total number of observations (rows in your dataset) with complete data for the variables of interest?
>
> count if !missing(id, level, A, B, C, D)
>
> Donald Spady
> ============
>
> I have a dataset with 6 variables of interest: id level A B C D. There are 100 individual id values, 24 individual level values and values for ABCD for each level of each id. There are a lot of missing data. How can I determine how many complete data sets I have; i.e. data sets of ID, Level, and A B C D values that are complete. I have looked at misstable. It is easy to determine the number of missing A B C D data but when it comes to seeing how many complete sets of Level A B C D , I don't know what to do.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
Don Spady
Nature bats last.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/