| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Re: foreach program
Dear Michael,
Thank you for taking the time to write this out for me. I don't
understand it yet, but I will look closely at it. I'm in desperate
need to find more efficient ways to work with this census data. As I
said, I'm spending a lot of time doing simple addition and division
simply by looping through so many cell groups with such a large
dataset. I will look into this now.
scott
On Sep 8, 2006, at 12:22 PM, Michael Blasnik wrote:
The code I posted should work for your problem. If you want to run
it for multiple age groupss, I suggest you create a categorical
variable for those cohorts as well (using recode?). I have revised
my code to solve the full problem (assuming you have created agecat
to represent the age categories):
gen byte groupqtrs=(gqtyped==200)
keep fip race sex agecat groupqtrs
collapse (sum) perwt, by(fip race sex agecat groupqtrs)
reshape wide perwt, i(fip race sex agecat) j(groupqtrs)
gen totpersons=perwt0+perwt1
gen ir=perwt1/totpersons
This will give you one observation for each demographic cell: by
fip, race, sex and agecat. You may then wish to do more reshapes
or select certain cases for further analysis. I would guess that
it will save you a lot of time to do it this way. If you are
running into memory constraints for holding the entire national 5%
sample (or using virtual memory), you could loop through the
states, reading in each state's data one at a time from the master
file and running this code and saving the results in a file named
after the fip code. The -keep- command in the second line may help
a lot in terms of file size, so you may not need to worry about that.
Michael Blasnik
[email protected]
----- Original Message ----- From: "Scott Cunningham"
<[email protected]>
To: <[email protected]>
Sent: Friday, September 08, 2006 12:12 PM
Subject: Re: st: Re: foreach program
Dear Michael,
If there is a faster way to do what I'm doing, then I'd love to
know it, as the code I use takes me a few days to execute because
of the computer I'm using and the size of the Census longform
survey. Here's a description of what I'm doing. I am
calculating incarceration rates by demographic cell, which is
defined at the United States state-age-race-sex-year level. I
have data for 1980, 1990 and 2000. In 1980, the "group quarter"
variable was definite differently than how it was defined in 1990
and 2000, so I've been running two do files - one for 1980 and
one for 1990/2000, but they are essentially identical.
I have 9 different age cohorts. I only reported the code for one
of them, since they are all identical calculations. The age
cohorts are:
1. 15-19 year olds
2. 20-24 year olds
3. 25-29 year olds
4. 30-34 year olds
5. 35-39 year olds
6. 40-44 year olds
7. 45-54 year olds
8. 55-64 year olds
9. 65+ year olds
I have 51 states (50 US states plus District of Columbia).
I have two races (black and white), two sex values, and three
census years (1980, 1990 and 2000). My understanding was that to
create so many separate incarceration rates and levels, I would
need to reproduce the same code for each demographic cell. So
I've been using -foreach- to do it. Do you disagree, though,
that this is not the most efficient method?
sc
On Sep 8, 2006, at 12:02 PM, Michael Blasnik wrote:
I've been reading this thread and don't understand why you need
to loop at all or generate the grouping variable. Wouldn't it
make more sense to use a collapse and a reshape?
keep if inrange(age,15,19)
gen byte groupqtrs=(gqtyped==200)
collapse (sum) perwt, by(fip race sex groupqtrs)
reshape wide perwt, i(fip race sex) j(groupqtrs)
gen totpersons=perwt0+perwt1
gen ir=perwt1/totpersons
This approach seems easier and faster and gives you a dataset of
results directly.
You could take the results and merge them back into the main
dataset if you want, but I don't even think that is necessary.
Michael Blasnik
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/