Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Clustering by school year
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: Clustering by school year
Date
Sat, 23 Oct 2010 22:39:16 -0400
At 10:03 PM 10/23/2010, Jose A wrote:
Would clustering by school year be as simple as generating a
variable school_year = school identifier * year, and then using this
new varialbe as the cluster?
Just from a practical standpoint, this could work, provided that
school_identifier is numeric (preferably an integer).
But you also need to assure that the values you get will constitute a
one-to-one mapping of school_identifier and year to the resulting number.
That is, there should be no distinct pairs of school_identifier and
year that map to the same value.
Say that you have school_identifiers 200 and 201, and your years are
2000 and 2010. You would have,
2000 * 201 = 402000
2010 * 200 = 402000
-- thus, a many-to-one mapping.
You need to inspect your set of years and school_identifiers to see
if something like this would happen.
If this situation arises, then you need some other scheme. You could
extract the unique pairs of school_identifier and year that occur in
the data. (Or, if you need to be more general, obtain the sets of
years and school_identifier separately; form the cross-product; see
-help cross-.) With this set, -gen long clusterid = _n-; save it, and
later merge your analysis file to this file.
HTH
--David
----
P.S., there is another numeric-based solution: either,
k1 * school_identifier + year
or
k2 * year + school_identifier
where k1 or k2 are strategically chosen constants:
k1 > max(year)
k2 > max(school_identifier)
One possibility is
10000 * school_identifier + year
----
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/