Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: St: collapse by _N
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: St: collapse by _N
Date
Wed, 20 Oct 2010 11:31:31 +0100
All good advice, and here is some more:
1. I echo Michael in noting that -collapse- can produce a count variable, so that there is no need to set up your own. Of course, you would then need to drop data based on small samples after the -collapse-.
2. Be aware of -contract-. It has precisely the role of collapsing to frequencies, and so by default produces a count variable. By implication Ric here wants mostly to -collapse- to means, but I've often seen people use -collapse- when their objective was more directly matched by -contract-.
Nick
[email protected]
Michael Mitchell
================
In addition to the great answers Chris and Ulrich sent, I might suggest that you
include a variable that counts the number of valid observations. After having the
collapsed file, you could then decide what you might want to use as a threshold for the
data being too unreliable. You can see more examples about collapsing, including examples
using count, at http://www.ats.ucla.edu/stat/stata/modules/collapse.htm .
Ulrich Kohler
=============
. bysort geocode: gen n = _N
. collapse (mean) varlist if n >= 20, by(geocode)
Chris Parker
============
You could count the observations in each geocode, then drop if there are too few observations then collapse.
bysort geocode: gen numobs=_N
drop if numobs < 20
collapse varlist, by(geocode)
Eric Uslaner
============
> I have a survey data set with respondents geocoded. I want to collapse the data set to the geocode level, so the simple command would be:
>
> collapse varlist,by(geocode)
>
> However some geocodes barely have any respondents and any collapsed data would be unreliable. Is there a straightforward way to collapse only if the number of respondents is> 20 (e.g.)?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/