Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: creating a new variable
From
Amal Khanolkar <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: creating a new variable
Date
Wed, 18 Jul 2012 12:02:51 +0000
Thank you Nick, Maarten & steve for your suggestions.
The tabstat command is the perfect way to get a descriptive take on what I wanted.
I tried the following and find a discrepency in the number of subjects:
. egen mean_bw = mean(bw), by(gestwk)
. tab mean_bw
mean_bw | Freq. Percent Cum.
------------+-----------------------------------
559.5574 | 134 0.00 0.00
616.5096 | 387 0.01 0.02
699.3734 | 738 0.02 0.04
790.9377 | 1,235 0.04 0.08
902.7249 | 1,688 0.06 0.14
1014.961 | 2,125 0.07 0.21
1138.658 | 2,723 0.09 0.30
1295.815 | 3,415 0.11 0.42
1461.302 | 4,481 0.15 0.57
1655.637 | 5,876 0.20 0.76
1858.227 | 8,533 0.29 1.05
2092.705 | 12,958 0.43 1.48
2325.826 | 21,420 0.72 2.20
2592.584 | 36,710 1.23 3.42
2837.138 | 70,297 2.35 5.77
3081.272 | 151,310 5.06 10.83
3309.638 | 9,763 0.33 11.16
3313.268 | 373,660 12.49 23.65
3488.345 | 660,536 22.08 45.73
3627.659 | 1,648 0.06 45.78
3637.902 | 822,376 27.49 73.28
3698.833 | 5,470 0.18 73.46
3755.764 | 542,442 18.13 91.59
3791.726 | 31,928 1.07 92.66
3826.705 | 219,603 7.34 100.00
------------+-----------------------------------
Total | 2,991,456 100.00
. tabstat bw, by(gestwk) stat (mean n sd)
Summary for variables: bw
by categories of: gestwk
gestwk | mean N sd
---------+------------------------------
22 | 559.5574 122 209.6139
23 | 616.5096 365 134.5845
24 | 699.3734 691 135.2207
25 | 790.9377 1171 147.066
26 | 902.7248 1610 189.5523
27 | 1014.961 2024 201.809
28 | 1138.658 2613 238.724
29 | 1295.815 3316 278.1803
30 | 1461.302 4367 299.6202
31 | 1655.637 5732 345.8412
32 | 1858.227 8369 359.1699
33 | 2092.704 12771 402.861
34 | 2325.826 21149 416.8742
35 | 2592.584 36451 458.3818
36 | 2837.138 69940 464.2042
37 | 3081.272 150767 465.5551
38 | 3313.268 372601 453.221
39 | 3488.345 658969 445.2462
40 | 3637.902 820460 453.1178
41 | 3755.764 541160 467.3571
42 | 3826.705 219074 485.0738
43 | 3791.726 31859 507.7569
44 | 3698.833 5454 512.7899
45 | 3627.659 1631 531.2405
---------+------------------------------
Total | 3502.912 2972666 575.2709
----------------------------------------
As one can see from above the N for each gestational week isn't the same for the two tabs. I get the same problem when using:
bys gestwk : egen mean1 = mean(bw)
The N's are almost the same for most gestwk thus giving the same mean BW. But in some cases the N's differ quite a bit giving larger differences in mean BW.
Thanks,
/Amal
________________________________________
From: [email protected] [[email protected]] on behalf of Nick Cox [[email protected]]
Sent: 18 July 2012 13:40
To: [email protected]
Subject: Re: st: creating a new variable
Here are five solutions for a similar problem.
. sysuse auto
. tab rep78, su(mpg)
Repair | Summary of Mileage (mpg)
Record 1978 | Mean Std. Dev. Freq.
------------+------------------------------------
1 | 21 4.2426407 2
2 | 19.125 3.7583241 8
3 | 19.433333 4.1413252 30
4 | 21.666667 4.9348699 18
5 | 27.363636 8.7323849 11
------------+------------------------------------
Total | 21.289855 5.8664085 69
. tabstat mpg , by(rep78)
Summary for variables: mpg
by categories of: rep78 (Repair Record 1978)
rep78 | mean
---------+----------
1 | 21
2 | 19.125
3 | 19.43333
4 | 21.66667
5 | 27.36364
---------+----------
Total | 21.28986
--------------------
. graph dot (mean) mpg, over(rep78) vertical
. egen mean_mpg = mean(mpg), by(rep78)
. scatter mean_mpg rep78
. dotplot mpg, over(rep78) bar
On Wed, Jul 18, 2012 at 11:34 AM, Amal Khanolkar <[email protected]> wrote:
> I have a very simple problem that I'm unable to find a simple solution for:
>
> Below is the data concerned:
>
> Gestational age in weeks:
>
> tab gestwk
>
> gestwk | Freq. Percent Cum.
> ------------+-----------------------------------
> 22 | 134 0.00 0.00
> 23 | 387 0.01 0.02
> 24 | 738 0.02 0.04
> 25 | 1,235 0.04 0.08
> 26 | 1,688 0.06 0.14
> 27 | 2,125 0.07 0.21
> 28 | 2,723 0.09 0.30
> 29 | 3,415 0.11 0.42
> 30 | 4,481 0.15 0.57
> 31 | 5,876 0.20 0.76
> 32 | 8,533 0.29 1.05
> 33 | 12,958 0.43 1.49
> 34 | 21,420 0.72 2.20
> 35 | 36,710 1.23 3.44
> 36 | 70,297 2.36 5.79
> 37 | 151,310 5.07 10.87
> 38 | 373,660 12.53 23.40
> 39 | 660,536 22.15 45.55
> 40 | 822,376 27.58 73.13
> 41 | 542,442 18.19 91.33
> 42 | 219,603 7.37 98.69
> 43 | 31,928 1.07 99.76
> 44 | 5,470 0.18 99.94
> 45 | 1,648 0.06 100.00
> ------------+-----------------------------------
> Total | 2,981,693 100.00
>
>
> Mean birth weight of my study sample:
>
> . sum bw
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> bw | 2980093 3502.431 575.7603 300 6780
>
> sum bw if gestwk==26
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> bw | 1610 902.7248 189.5523 350 1970
>
> . sum bw if gestwk==26
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> bw | 1610 902.7248 189.5523 350 1970
>
>
> Below, if I would like to look at the mean birth weight for a particular gestational week:
>
> . sum bw if gestwk==27
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> bw | 2024 1014.961 201.809 380 1920
>
> . sum bw if gestwk==28
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> bw | 2613 1138.658 238.724 370 2000
>
> . sum bw if gestwk==29
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> bw | 3316 1295.815 278.1803 370 2480
>
>
> What I would like to do is to create a single continuous variable that would give me the mean birth weight for each gestational week so that I don't have to look at it individually as above. I would like to ideally be able to use this variable in scatter plots.
>
> If I plot as follows:
>
> scatter twoway bw gestwk
>
> I of course don't get a single estimate for each gestational week, but instaed the entire range of birth weight for a particular week is plotted.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/