Traci A Schlesinger
>
> I have census data that is organized so that there is an
> observation for each
> race-sex-year category in a state. In other words, there
> is one observation for
> white men in Alabama in 1981, one for white women in
> Alabama 1981, etc., etc.
> Further, there are separate variables for several age
> groups. The data looks like
> this.
>
> Fips Year Race Sex age04 age59 . . .
> age8084 age8500
> 1 1 1 1 101151 102545
> 12032
> 12032
>
> where fips are state fips codes, year is the last digit in
> the year (the data only
> spans 1981 to 1989), race is a categorical variable, sex a
> dummy, and the number
> in age04 is the number of (in this case) white boys aged 0
> - 4 in Alabama in 1981
> (the number in age8500 is the number of white men over 85
> in Alabama in 1981).
>
> What i want is to reshape the age long, so that i have an
> observation for each
> individual in the sample. Thus, I would have 101151
> observations of white men in
> Alabama in 1981.
>
> i tried:
>
> reshape long age, i( fips year race sex)
>
> but this does not work. it creates an age variable that
> has the values that were
> in each age variable, rather than an observation for each
> of the individuals
> counted in each age group. of course, this means the race
> and sex counts are also
> not correct. How do i get what I am looking for? Do i
> need to generate a
> different age variable first? Any advise would be appreciated!
>
You're most of the way there.
First, when I tried this, I had to go
. l
Fips Year Race Sex age04
age59 age8084 age8500
1. 1 1 1 1 101151
102545 12032 12032
. reshape long age, i( Fips Year Race Sex) string
because of a problem documented at
http://www.stata.com/support/faqs/data/reshape3.html
namely
"On occasion, people use numeric suffixes with leading zeros,
such as 01, 02, and so forth. -reshape- will understand these
properly only if they are declared as string."
Anyway, the result is
. l
Fips Year Race Sex _j
age
1. 1 1 1 1 04
101151
2. 1 1 1 1 59
102545
3. 1 1 1 1 8084
12032
4. 1 1 1 1 8500
12032
The problem is just one of names, and as you say -age- is
really frequency and -- also -- _j is really -age-.
To get to where you want to be, it is now an -expand- problem.
-reshape- worked as advertised, and had no way of knowing
that you also wanted to -expand-.
. expand age
. drop age
. rename _j age
< clean up age>
except that wait a moment! Why do you need e.g. 102,545
observations which are all the same? Only if you need to
run a command which does not accept weights, I suggest.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/