Another way to do it:
gen ZZ = 0
qui forval i = 1 /`=_N' {
foreach v of var B-Y {
local list `"`list' `"`=`v'[`i']'"'"'
local uniq : list uniq list
}
replace ZZ = `: list sizeof uniq' in `i'
local list
}
The single, double, and compound double quotes
require a little care here.
This is the somethimes deprecated loop over
observations, which nevertheless has a certain charm
in this case.
Nick
[email protected]
P.S. in the previous message, add a final -renpfix-
to get your variable names back to the status quo ante.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Nick Cox
> Sent: 24 May 2004 17:16
> To: [email protected]
> Subject: st: RE: Counting unique values across a set of variables:
> Re-sent
>
>
> I think this is easiest through a
>
> -reshape-
> do stuff
> -reshape-
>
> sequence, otherwise known as the Stata twostep.
>
> First we -rename- variables, so that
> they have a common prefix, say
>
> foreach v of var B-Y {
> rename `v' S_`v'
> }
>
> Then we -reshape- to long:
>
> reshape long S_ , i(A) string
>
> Now our count of distinct strings is
>
> bysort A S_ : gen Z = _n == 1
> by A : replace Z = sum(Z)
> by A : replace Z = Z[_N]
>
> Now we -reshape- back
>
> reshape wide S_ , i(A) string
>
> and then -Z- is an extra variable
> in the dataset.
>
> Note that this counts "."
> as a value like any other. (And
> indeed also "", " ", " ", etc.)
>
> If you want to subtract 1 because "."
> is not of interest that one
> way to do that is
>
> gen countperiod = 0
> foreach v of var B-Y {
> replace countperiod = countperiod + (`v' == ".")
> }
>
> replace Z = Z - (countperiod > 0)
>
> Nick
> [email protected]
>
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]]On Behalf Of CM
> > Sent: 24 May 2004 16:53
> > To: [email protected]
> > Subject: st: Counting unique values across a set of
> variables: Re-sent
> >
> >
> > Hi all,
> >
> > I checked findit but don't believe I found what I
> > need.
> >
> > Each row in my data represents a respondent. Besides
> > the first column "A" representing ID, the other
> > columns (call them B thru Y) contain strings or "." I
> > need to create a variable in column Z that counts the
> > number of unique strings found for any given
> > respondent in B thru Y. Advice?
> >
> > Thanks in advance,
> > CM
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/