Note for anyone interested:
-levelsof- as implemented in Stata 9 differs
subtly from -levels- as added to Stata 8
during its lifetime.
That aside, I am very surprised at Iwan's
report that -levelsof- reports categories
according to their order of occurrence in the data.
That contradicts not just the help file, but
also the code as I read it (and for that matter
as I wrote it, originally). StataCorp would like
to see evidence, I am sure. I suspect Iwan's
impression is mistaken, but I am not sure why
it arises.
The general problem to which -levelsof- is
one solution is discussed in
http://www.stata.com/support/faqs/data/foreach.html
A fairly general strategy for going through all
possible levels
* according to their order of first occurrence
* in the data
is as follows.
(This circumvents problems arising when -levelsof-
cannot cope.)
Suppose we have an identifier, say -id-.
First generate an observation number:
gen long obs = _n
Now we sort by -id-, breaking ties by
-obs-. The first observation in each block
then carries information on first occurrence.
We copy the observation number of first
occurrence to each other occurrence of the same id.
bysort id (obs) : replace obs = obs[1]
Now we tag ids from 1 to whatever, according
to first occurrence:
bysort obs : gen group = _n == 1
replace group = sum(group)
Those familiar with -egen, group()- may
recognise the basic idea here.
Now the number of groups is identifiable from
su group, meanonly
local max = r(max)
Typically then you loop over groups:
forval i = 1/`max' {
...
}
Within that loop, a look-up technique to
get the identifier concerned is, for
a numeric identifier:
su id if group == `i', meanonly
All identifiers in each group are the same,
so it matters little whether we pick up
the minimum, the mean or the maximum:
local which = r(min)
will do, for example.
If the identifier -id- is a string variable, a little
more work is needed. Outside the loop,
replace obs = _n
Inside the loop,
su obs if group == `i', meanonly
local which = id[`r(min)']
Nick
[email protected]
Barankay, Iwan
>
> I find the command "levelsof" very useful to cut down the
> time on loops when I run through the category of a variable
> (e.g. the location_ids of a large survey).
>
> What I also like is that the local macro generated by
> levlesof is - so it seams to me - still in the order in which
> it appears in the data and does not sort it which is needed
> at times (even though the hlp file of levelsof says
> otherwise). When usually a list is entered into a local it is
> then sorted.
>
> The problem of course is that there are constraints on
> levelsof when it hits the character limit.
>
> My question is:
>
> What can I use instead of levelsof for (i) a large number of
> categories to avoid the character constraint but which (ii)
> also keeps the categories in the order it appears in the data
> and does not sort it.
>
> (i) is much more important than (ii) but if someone did an
> elegant solution for (ii) I would love to hear of it.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/