The help for -mstdize- contains a detailed worked example,
which shows that Donnell needs a different data structure
from what he has to make use of -mstdize-. Each row total
must be repeated for every cell in that row, and similarly
for columns.
What's more, -mstdize- expects frequencies, not proportions.
Here's the worked example modernised to avoid use of the old-fashioned
-for- (see concurrent thread).
. input freq age status
freq age status
1. 1306 1 1
2. 83 1 2
3. 0 1 3
4. 619 2 1
5. 765 2 2
6. 3 2 3
7. 263 3 1
8. 1194 3 2
9. 9 3 3
10. 173 4 1
11. 1372 4 2
12. 28 4 3
13. 171 5 1
14. 1393 5 2
15. 51 5 3
16. 159 6 1
17. 1372 6 2
18. 81 6 3
19. 208 7 1
20. 1350 7 2
21. 108 7 3
22. 1116 8 1
23. 4100 8 2
24. 2329 8 3
25. end
. gen rt = .
(24 missing values generated)
. tokenize 1412 1402 1450 1541 1681 1532 1662 7644
. qui forval i = 1/8 {
2. replace rt = ``i'' if age == `i'
3. }
. gen ct = .
(24 missing values generated)
. tokenize 3988 11702 2634
. qui forval i = 1/3 {
2. replace ct = ``i'' if status == `i'
3. }
. list
| freq age status rt ct |
1. | 1306 1 1 1412 3988 |
2. | 83 1 2 1412 11702 |
3. | 0 1 3 1412 2634 |
4. | 619 2 1 1402 3988 |
5. | 765 2 2 1402 11702 |
6. | 3 2 3 1402 2634 |
7. | 263 3 1 1450 3988 |
8. | 1194 3 2 1450 11702 |
9. | 9 3 3 1450 2634 |
10. | 173 4 1 1541 3988 |
11. | 1372 4 2 1541 11702 |
12. | 28 4 3 1541 2634 |
13. | 171 5 1 1681 3988 |
14. | 1393 5 2 1681 11702 |
15. | 51 5 3 1681 2634 |
16. | 159 6 1 1532 3988 |
17. | 1372 6 2 1532 11702 |
18. | 81 6 3 1532 2634 |
19. | 208 7 1 1662 3988 |
20. | 1350 7 2 1662 11702 |
21. | 108 7 3 1662 2634 |
22. | 1116 8 1 7644 3988 |
23. | 4100 8 2 7644 11702 |
24. | 2329 8 3 7644 2634 |
. mstdize freq rt ct , by(age status)
| status
age | 1 2 3
1 | 1325.27 86.73 0.00
2 | 615.56 783.39 3.05
3 | 253.94 1187.18 8.88
4 | 165.13 1348.55 27.32
5 | 173.41 1454.71 52.87
6 | 147.21 1308.12 76.67
7 | 202.33 1352.28 107.40
8 | 1105.16 4181.04 2357.81
There is a matrix version of -mstdize- called
-mstdizem- in the -matodd- package on SSC.
Alan Agresti explains how to use generalized
linear model software to get such estimates
in his "Categorical data analysis" text. As
I recall, the key is to use offsets.
[email protected]
Donnell Butler
> I am trying to update a 2000 two-way table using
> 2004 one-way
> information. I wanted to do it using IPF (iterative
> proportional fitting). I
> soon learned that Nick Cox created a program (MSTDIZE) that
> may be useful.
> However, I am obviously not framing the data correctly to
> obtain the desired goal.
> Here is a simplified version of the dilemma:
> (1) Imagine a two-way table of proportions:
> HSize00 1 2 Totals
> Inc00 1 .55 .12 .67
> Inc00 2 .20 .13 .33
> Totals .75 .25 1.00
> (2) Imagine two one-way tables to be used to update the two-way table:
> Inc04 HSize04
> 1 .60 1 .65
> 2 .40 2 .35
> (3) To attempt MSTDIZE, I have entered the data into Stata as follows:
> Inc00 Hsize00 IbyH00 Inc04 HSize04
> 0.67 0.75 0.55 0.60 0.65
> 0.33 0.25 0.12 0.40 0.35
> 0.20
> 0.13
> (4) So in Stata the data looks like this:
> . list
> +--------------------------------------------------+
> | igroup00 hsize00 ibyh00 igroup04 hsize04 |
> |--------------------------------------------------|
> 1. | .33 .22 .13 .4 .35 |
> 2. | .67 .78 .56 .6 .65 |
> 3. | . . .19 . . |
> 4. | . . .12 . . |
> +--------------------------------------------------+
> (5) When I run MSTDIZE, this is the output:
> . mstdize ibyh00 igroup04 hsize04, by(igroup00 hsize00)
> generate (ibyh04)
> ----------------------
> | Hsize00
> Igroup00 | .22 .78
> ----------+-----------
> .33 | 0.35
> .67 | 0.65
> ----------------------
> (2 missing values generated)
> (6) Well, that is not what I hoped for. I was hoping for a new table
> (ibyh04) with 4 observations (new row1/column1, new r1c2, new
> r2c1, and new
> r2c2). Instead, I ended up with 2 observations that were
> exactly the same as
> hsize04.
> (7) This is probably a case of not really understanding what
> designed to do. Does anyone have any suggestions on how I can
> get Stata (via
> MSTDIZE or another means) to obtain an IPF adjusted ibyh04
> (two-way table
> updated from original two-way and two one-ways)?
> (8) And, as a bonus. I gathered those proportions from tab
> hsize igroup,
> cell command. So, if anyone knows how I could easily turn
> those relative
> frequencies into a variable. I do know that the tab hsize igroup, cell
> matcell (matname) produces a 2x2 matrix of actual (not relative)
> frequencies. What I don't know is how to get relative
> frequencies in that
> matrix? Or, how to transfer or use matrices in a simpler way
> than what I did
> by hand above by transcribing the frequencies into excel for
> new data set
> variable generation?
* For searches and help try: