Raphael Fraser wrote:
> I have 50 data sets (d1, d2, ..., d50) like the one given below but
> much larger. I would like to calculate the prevalence of protein for
> each data set by genotype (AA and SS). Then put the results together
> in a single file. Can this be done?
>
>
> DATA SET 1(d1)
> id age genotype protein
> 31 11 AA 1
> 40 11 SS 0
> 71 11 AA 1
> 74 11 AA 0
> 88 11 AA 0
> 98 11 AA 0
> 110 11 SS 1
>
> The first two observation in the RESULTS file should look some thing like
> this:
>
> age genotype prevalence
> 11 AA 0.4
> 11 SS 0.5
Looks like an easy job for -collapse-, thus:
. collapse (mean) age protein, by(genotype)
You have to discard your -id- variable for this to work. I assume you mean
-protein- when you say -prevalance-, in which case you can easily
-rename-. I hope this helps.
CLIVE NICHOLAS |t: 0(044)7903 397793
Politics |e: [email protected]
Newcastle University |http://www.ncl.ac.uk/geps
Whereever you go and whatever you do, just remember this. No matter how
many like you, admire you, love you or adore you, the number of people
turning up to your funeral will be largely determined by local weather
conditions.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/