Dale Plummer <[email protected]> asks about using weights with
-statsby-:
> I may have overlooked something obvious, but I cannot see why the
> statsby command will not allow weights in the commands it is executing.
> Would someone please explain this?
There really isn't a good reason for this. From a development point of view,
-statsby- uses the same parsing engine as -bootstrap-, -jknife-, -simulate-,
and -permute-; some of which require careful consideration (and new code) to
handle weights.
There are ways around this. The long way is to set up -postfile- and use
-post- within a -forvalues- loop. This requires a decent amount of coding to
reproduce some of the features of -statsby-.
The short way, involves tricking -statsby-. I generally would warn users
against trying to "trick" a command to do something that a developer purposely
tried to prevent, but this is one of those special cases.
Suppose we want to use fweights with -summarize- for each category of a
variable. The unweighted version would be
. sysuse auto
(1978 Automobile Data)
. statsby "sum mpg" r(mean), by(rep)
command: sum mpg
statistic: _stat_1 = r(mean)
by: rep78
. list
+------------------+
| rep78 _stat_1 |
|------------------|
1. | 1 21 |
2. | 2 19.125 |
3. | 3 19.43333 |
4. | 4 21.66667 |
5. | 5 27.36364 |
+------------------+
As already noted, -statsby- does not like weights to be specified:
. capture noisily statsby "sum mpg [fw=1]" r(mean), by(rep)
weights not allowed
We could write a wrap-around command for -summarize- that took weights in a
different way:
program mysum
syntax varlist [if] [in] [, weight(string) * ]
sum mpg `if' `in' [`weight'], `options'
end
Now we can pass weights to -summarize- using -mysum-'s -weight()- option.
Here we'll specified an -fweight- of one to check the result with the
unweighted version:
. sysuse auto
(1978 Automobile Data)
. statsby "mysum mpg, weight(fw=1)" r(mean), by(rep)
command: mysum mpg , weight(fw=1)
statistic: _stat_1 = r(mean)
by: rep78
. list
+------------------+
| rep78 _stat_1 |
|------------------|
1. | 1 21 |
2. | 2 19.125 |
3. | 3 19.43333 |
4. | 4 21.66667 |
5. | 5 27.36364 |
+------------------+
Now let's really specify some weights:
. statsby "mysum mpg, weight(fw=turn)" r(mean), by(rep)
command: mysum mpg , weight(fw=turn)
statistic: _stat_1 = r(mean)
by: rep78
. list
+------------------+
| rep78 _stat_1 |
|------------------|
1. | 1 20.92683 |
2. | 2 18.97983 |
3. | 3 19.11445 |
4. | 4 21.1342 |
5. | 5 27.19898 |
+------------------+
We can verify the weights were specified by looking at the results on a
group-by-group basis:
. sysuse auto
(1978 Automobile Data)
. sum mpg [fw=turn] if rep==1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mpg | 82 20.92683 3.017564 18 24
. sum mpg [fw=turn] if rep==2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mpg | 347 18.97983 3.466128 14 24
. sum mpg [fw=turn] if rep==3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mpg | 1232 19.11445 4.018323 12 29
. sum mpg [fw=turn] if rep==4
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mpg | 693 21.1342 4.836715 14 30
. sum mpg [fw=turn] if rep==5
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mpg | 392 27.19898 8.349844 17 41
As a final note, let me just warn against using this trick with -bootstrap-,
-permute-, and -jknife-. The result will most definitely not be what you
would expect.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/