Title | Predict and adjust |
Author | Brian P. Poi, StataCorp |
Many people have written to the technical staff asking about the differences between predict and adjust. In this FAQ, I present a simple example using the auto dataset. This is by no means a substitute for the Reference Manual entries for either adjust or predict. Presumably, you have already read those. If not, that would be a good idea.
To begin, let’s load the auto.dta dataset and regress mpg against weight, length, and foreign:
. sysuse auto (1978 Automobile Data) . regress mpg weight length foreign Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 3, 70) = 48.10 Model | 1645.2889 3 548.429632 Prob > F = 0.0000 Residual | 798.170563 70 11.4024366 R-squared = 0.6733 -------------+------------------------------ Adj R-squared = 0.6593 Total | 2443.45946 73 33.4720474 Root MSE = 3.3767 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.0043656 .0016014 -2.73 0.008 -.0075595 -.0011718 length | -.0827432 .0547942 -1.51 0.136 -.1920267 .0265403 foreign | -1.707904 1.06711 -1.60 0.114 -3.836188 .4203806 _cons | 50.53701 6.245835 8.09 0.000 38.08009 62.99394 ------------------------------------------------------------------------------
Next compute the linear prediction of the dependent variable and summarize it by rep78:
. predict yhat, xb . tabstat yhat, statistics(mean) by(rep78) Summary for variables: yhat by categories of: rep78 (Repair Record 1978) rep78 | mean ---------+---------- 1 | 21.36511 2 | 19.39887 3 | 19.91184 4 | 21.86001 5 | 24.91809 ---------+---------- Total | 21.20081 --------------------
Compare this with what we obtain if we use the adjust command:
. adjust, by(rep78) ---------------------------------------------------------------------------- Dependent variable: mpg Command: regress Variables left as is: weight, length, foreign ---------------------------------------------------------------------------- ---------------------- Repair | Record | 1978 | xb ----------+----------- 1 | 21.3651 2 | 19.3989 3 | 19.9118 4 | 21.86 5 | 24.9181 ---------------------- Key: xb = Linear Prediction
The results are the same! When you use the adjust command without specifying any variables, it simply summarizes the linear predictions of the regression by rep78. Suppose that instead I typed
. adjust foreign, by(rep78) ---------------------------------------------------------------------------- Dependent variable: mpg Command: regress Variables left as is: weight, length Covariate set to mean: foreign = .30434781 ---------------------------------------------------------------------------- ---------------------- Repair | Record | 1978 | xb ----------+----------- 1 | 20.8453 2 | 18.8791 3 | 19.5628 4 | 22.1942 5 | 25.7957 ---------------------- Key: xb = Linear PredictionThe key to understanding what happened here are the two lines at the top of the output:
Variables left as is: weight, length Covariate set to mean: foreign = .30434781
For two of the independent variables in our regression, weight and length, adjust did nothing; it left them as is. However, in computing the linear prediction of mpg, adjust did not use the actual values of foreign that are in the dataset. Instead, it computed the prediction, pretending that the value of foreign was 0.30434781 for every observation in the dataset. Some people would argue that evaluating the equation with foreign equal to 0.304 is nonsense because foreign is a dummy variable that takes only the values 0 or 1; either the car is foreign, or it is domestic. On the other hand, one could interpret the results with foreign equal to 0.304 as pertaining to a car that contains 70% domestic parts and 30% foreign parts. Whether to force a dummy variable to remain 0 or 1 when forming predictions depends entirely on the context of the model.
The real power of adjust is in being able to create predictions assuming certain values for some of the independent variables. Suppose I wanted to know the average predicted fuel economy of cars by rep78 under the assumption that all cars are domestic. With adjust, this is easy to do:
. adjust foreign=0, by(rep78) ---------------------------------------------------------------------------- Dependent variable: mpg Command: regress Variables left as is: weight, length Covariate set to value: foreign = 0 ---------------------------------------------------------------------------- ---------------------- Repair | Record | 1978 | xb ----------+----------- 1 | 21.3651 2 | 19.3989 3 | 20.0826 4 | 22.714 5 | 26.3155 ---------------------- Key: xb = Linear Prediction
Of course, you can specify more than one variable with adjust, and you can have some variables set to values you specify and other variables set to their means. For example, now I want to know the average fuel economy by rep78 under the assumptions that all cars are domestic and all cars are of the same (average) length. I have no idea what the average length of the cars is, so I will let adjust figure it out:
. adjust foreign=0 length, by(rep78) ---------------------------------------------------------------------------- Dependent variable: mpg Command: regress Variable left as is: weight Covariate set to mean: length = 188.28986 Covariate set to value: foreign = 0 ---------------------------------------------------------------------------- ---------------------- Repair | Record | 1978 | xb ----------+----------- 1 | 21.4239 2 | 20.3161 3 | 20.5551 4 | 22.428 5 | 24.8172 ---------------------- Key: xb = Linear Prediction
As the top of the output shows, adjust set length equal to its mean value of 188.28986, and it set foreign equal to 0 as we requested. Because we asked for the results to be tabulated based on rep78, the mean of length was computed using only the 69 observations for which rep78 is not missing. The 5 observations with a missing rep78 are completely ignored by adjust, even though they were used in the original regression.
In fact, adjust is really just a front end for predict, and it is helpful to work through the mechanics of an example to illustrate this. The previous table of results could have been obtained in the following manner:
. preserve . summarize length if rep78<., meanonly . replace length=r(mean) length was int now float (74 real changes made) . replace foreign=0 (22 real changes made) . predict yhat2, xb . tabstat yhat2, statistics(mean) by(rep78) Summary for variables: yhat2 by categories of: rep78 (Repair Record 1978) rep78 | mean ---------+---------- 1 | 21.42387 2 | 20.31609 3 | 20.55511 4 | 22.42796 5 | 24.81715 ---------+---------- Total | 21.7206 -------------------- . restore
The advantage of adjust is that we do not have to preserve our data, summarize and replace it, and then call tabstat ourselves.