| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Re: foreach program
The code I posted should work for your problem. If you want to run it for
multiple age groupss, I suggest you create a categorical variable for those
cohorts as well (using recode?). I have revised my code to solve the full
problem (assuming you have created agecat to represent the age categories):
gen byte groupqtrs=(gqtyped==200)
keep fip race sex agecat groupqtrs
collapse (sum) perwt, by(fip race sex agecat groupqtrs)
reshape wide perwt, i(fip race sex agecat) j(groupqtrs)
gen totpersons=perwt0+perwt1
gen ir=perwt1/totpersons
This will give you one observation for each demographic cell: by fip, race,
sex and agecat. You may then wish to do more reshapes or select certain
cases for further analysis. I would guess that it will save you a lot of
time to do it this way. If you are running into memory constraints for
holding the entire national 5% sample (or using virtual memory), you could
loop through the states, reading in each state's data one at a time from the
master file and running this code and saving the results in a file named
after the fip code. The -keep- command in the second line may help a lot in
terms of file size, so you may not need to worry about that.
Michael Blasnik
[email protected]
----- Original Message -----
From: "Scott Cunningham" <[email protected]>
To: <[email protected]>
Sent: Friday, September 08, 2006 12:12 PM
Subject: Re: st: Re: foreach program
Dear Michael,
If there is a faster way to do what I'm doing, then I'd love to know it,
as the code I use takes me a few days to execute because of the computer
I'm using and the size of the Census longform survey. Here's a
description of what I'm doing. I am calculating incarceration rates by
demographic cell, which is defined at the United States
state-age-race-sex-year level. I have data for 1980, 1990 and 2000. In
1980, the "group quarter" variable was definite differently than how it
was defined in 1990 and 2000, so I've been running two do files - one for
1980 and one for 1990/2000, but they are essentially identical.
I have 9 different age cohorts. I only reported the code for one of
them, since they are all identical calculations. The age cohorts are:
1. 15-19 year olds
2. 20-24 year olds
3. 25-29 year olds
4. 30-34 year olds
5. 35-39 year olds
6. 40-44 year olds
7. 45-54 year olds
8. 55-64 year olds
9. 65+ year olds
I have 51 states (50 US states plus District of Columbia).
I have two races (black and white), two sex values, and three census
years (1980, 1990 and 2000). My understanding was that to create so many
separate incarceration rates and levels, I would need to reproduce the
same code for each demographic cell. So I've been using -foreach- to do
it. Do you disagree, though, that this is not the most efficient method?
sc
On Sep 8, 2006, at 12:02 PM, Michael Blasnik wrote:
I've been reading this thread and don't understand why you need to loop
at all or generate the grouping variable. Wouldn't it make more sense
to use a collapse and a reshape?
keep if inrange(age,15,19)
gen byte groupqtrs=(gqtyped==200)
collapse (sum) perwt, by(fip race sex groupqtrs)
reshape wide perwt, i(fip race sex) j(groupqtrs)
gen totpersons=perwt0+perwt1
gen ir=perwt1/totpersons
This approach seems easier and faster and gives you a dataset of results
directly.
You could take the results and merge them back into the main dataset if
you want, but I don't even think that is necessary.
Michael Blasnik
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/