<>
Not sure whether your data transferred well, but this is probably close to
what you want :-)
**************
clear*
inp id time_in_months county/*
*/ str10 cancer
1 13 2 breast
2 14 2 breast
3 1 2 breast
end
compress
list, noobs
bys county cancer: /*
*/egen N_survivorsOneYear /*
*/ =total((time_in_months>12))
list, noobs
**************
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Johannes Schoder
Sent: Mittwoch, 16. September 2009 00:39
To: [email protected]
Subject: Re: AW: st: Different Results for the same estimation
I found another bug in my calculations:
Since I have the number of diagnosed cancer cases per cancer type,
county, and the survival time in months I wanted to calculate the
number of people surviving one year per county and cancer type. However
I did it wrong.
How can I generate a variable that gives my the number of people who
survived 12 months?
bysort CancerType COUNTY: egen N_SurvivorsOneYear =
count(time_in_months) if time_in_months>12
When time_in_months<12 N_SurvivorsOneYear gets zero or "." (missing value)
but I want that it takes the value of the number of survivors per
disease and per county.
I know my description sounds confusing here is an example:
id time_in_months county N_survivorsOneYear cancer type
1 13 01
2 breast
2 14 02
2 breast
3 1 03
. breast
instead of the "." missing value for id 3 and N_survivorsOneYear I want
to have "2"
Thanks a lot for your help!
Johannes
Martin Weiss schrieb:
<>
You can always -collapse- or make up a fake identifier as
-bys County disease: gen personid=_n-
-la var personid "Fake Identifier"-
To appreciate the meaning of this command, check Nick`s
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Johannes
Schoder
Gesendet: Dienstag, 15. September 2009 22:16
An: [email protected]
Betreff: Re: st: Different Results for the same estimation
Hi Martin:
Thanks a lot for your help.
Yes you are right I have nesting levels, within counties there are
diseases that afflict individuals.
Unfortunately I messed (or the data provider) something up when
importing the data. I just realized that I have a lot of individuals
with the same identifier variable (although they are not the same), so I
can't really use the id number.
Is there any alternative of aggregating the individual level data to the
county level?
Johannes
Martin Weiss schrieb:
<>
So there are three nesting levels? Within counties, there are diseases
afflicting individuals? If that is the case, you should amend your
command
as
- bysort County disease (individual): keep if _n==1-
to make it stable for the -glm- analysis. "individual" should be replaced
by
some identifier variable, like an id number.
Also look at -egen, tag()- as -drop-ping is not generally the best
approach
to conducting a restricted analysis ("How are you going to get the
dropped
obs back when you need them quickly?").
Also look at -xtmixed- and its brothers, as your analysis sounds like a
good
case for them...
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Johannes
Schoder
Sent: Dienstag, 15. September 2009 20:17
To: [email protected]
Subject: Re: st: Different Results for the same estimation
I found the bug:
Since I am using the following command before the estimation:
bysort County disease: keep if _n==1
Stata probably kicks out different obervations eacht time.
Does someone knows how to avoid that? A similar question was posed a
couple of days ago:
How to delete duplicate observations, Martin recommended the following
command that I used (see above):
bysort ID: keep if_n==1
However my problem is not exactly the same:
Since I would like to aggregate my individual level data to the county
level I would like to just keep one observation for each county [instead
of keeping one observation per county I would like to keep 98
observations per county (one observation per county and per cancer type;
there are 98 different cancer types)].
Therefore the observations I would like to drop are not the same
individuals, they just live in the same county and suffer from the same
disease.
Thanks for your help!!
Johannes
Johannes Schoder schrieb:
Dear Statalist users:
When I am estimating the same model several times afterwards (with the
same computer):
xi: glm [dep. var.] [indep. var.] i.county i.year, family (binomial
weight) link(logit)
I get different results for the exactly same specification.
Does anyone know whats going on here? Is it because of the different
number of iterations (sometimes 8,9 or 10)?
Which results are right? What can I do to get the identical result for
the same estimation?
Thanks a lot for any suggestion!
Johannes
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/