Call Stata using IPython magic commands¶
In an IPython environment, you can use the stata and mata magic commands to interact with Stata and Mata. Below, we include some examples to get you started using these magics. For more detailed information on the magic commands, see Magic commands.
The stata magic¶
The stata magic is used to execute Stata commands. It can be used as both a line magic, %stata, and a cell magic, %%stata. In other words, with %stata, you can run a single-line command (just as you would in the Stata Command window), and with %%stata, you can run multiple Stata commands from a cell at once (as you would in a do-file).
First, we load the auto dataset and describe its contents.
[2]:
%%stata
sysuse auto, clear
describe
. sysuse auto, clear
(1978 automobile data)
. describe
Contains data from C:\Program Files\Stata18/ado\base/a/auto.dta
Observations: 74 1978 automobile data
Variables: 12 13 Apr 2022 17:45
(_dta has notes)
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
make str18 %-18s Make and model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear ratio
foreign byte %8.0g origin Car origin
-------------------------------------------------------------------------------
Sorted by: foreign
.
Next, we fit a linear regression.
[3]:
%%stata
reg mpg price i.foreign
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 23.01
Model | 960.866305 2 480.433152 Prob > F = 0.0000
Residual | 1482.59315 71 20.8815937 R-squared = 0.3932
-------------+---------------------------------- Adj R-squared = 0.3761
Total | 2443.45946 73 33.4720474 Root MSE = 4.5696
------------------------------------------------------------------------------
mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
price | -.000959 .0001815 -5.28 0.000 -.001321 -.000597
|
foreign |
Foreign | 5.245271 1.163592 4.51 0.000 2.925135 7.565407
_cons | 25.65058 1.271581 20.17 0.000 23.11512 28.18605
------------------------------------------------------------------------------
The %%stata magic command offers several arguments that can be used to control the execution of Stata’s commands and to pass data between Stata and Python. For example, you can use the -eret, -ret, or -sret argument to push Stata’s e(), r(), or s() results to Python.
[4]:
%%stata -eret myeret
reg mpg price i.foreign
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 23.01
Model | 960.866305 2 480.433152 Prob > F = 0.0000
Residual | 1482.59315 71 20.8815937 R-squared = 0.3932
-------------+---------------------------------- Adj R-squared = 0.3761
Total | 2443.45946 73 33.4720474 Root MSE = 4.5696
------------------------------------------------------------------------------
mpg | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
price | -.000959 .0001815 -5.28 0.000 -.001321 -.000597
|
foreign |
Foreign | 5.245271 1.163592 4.51 0.000 2.925135 7.565407
_cons | 25.65058 1.271581 20.17 0.000 23.11512 28.18605
------------------------------------------------------------------------------
The code above passes the e() results returned by Stata’s regress command to Python as a dictionary named myeret. The keys of the dictionary are Stata’s macro and scalar names, and the values are their corresponding values. Here are its contents.
[5]:
myeret
[5]:
{'e(N)': 74.0,
'e(df_m)': 2.0,
'e(df_r)': 71.0,
'e(F)': 23.007494485746342,
'e(r2)': 0.39324012569622946,
'e(rmse)': 4.569638248831391,
'e(mss)': 960.8663049714787,
'e(rss)': 1482.5931544879809,
'e(r2_a)': 0.3761482982510528,
'e(ll)': -215.90831771275379,
'e(ll_0)': -234.39433764823468,
'e(rank)': 3.0,
'e(cmdline)': 'regress mpg price i.foreign',
'e(title)': 'Linear regression',
'e(marginsprop)': 'minus',
'e(marginsok)': 'XB default',
'e(vce)': 'ols',
'e(_r_z_abs__CL)': '|t|',
'e(_r_z__CL)': 't',
'e(depvar)': 'mpg',
'e(cmd)': 'regress',
'e(properties)': 'b V',
'e(predict)': 'regres_p',
'e(model)': 'ols',
'e(estat_cmd)': 'regress_estat',
'e(b)': array([[-9.59034169e-04, 0.00000000e+00, 5.24527100e+00,
2.56505843e+01]]),
'e(V)': array([[ 3.29592449e-08, 0.00000000e+00, -1.02918123e-05,
-2.00142479e-04],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
-0.00000000e+00],
[-1.02918123e-05, 0.00000000e+00, 1.35394617e+00,
-3.39072871e-01],
[-2.00142479e-04, -0.00000000e+00, -3.39072871e-01,
1.61691892e+00]]),
'e(beta)': array([[-0.4889233, 0. , 0.4172175]])}
Stata matrices (like e(b) and e(V)) are converted to NumPy arrays.
[6]:
e_v = myeret['e(V)']
e_v
[6]:
array([[ 3.29592449e-08, 0.00000000e+00, -1.02918123e-05,
-2.00142479e-04],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
-0.00000000e+00],
[-1.02918123e-05, 0.00000000e+00, 1.35394617e+00,
-3.39072871e-01],
[-2.00142479e-04, -0.00000000e+00, -3.39072871e-01,
1.61691892e+00]])
Stata’s graphs can also be displayed in the IPython environment. Here we create a scatterplot of car mileage against price by using the %stata line magic.
[7]:
%stata scatter mpg price
You can load a NumPy array or pandas DataFrame into Stata, making it the current working dataset, by specifying the %%stata magic’s -d argument. To demonstrate, we first load a Boston housing price dataset from the scikit-learn package into a pandas DataFrame named boston.
[8]:
from sklearn import datasets
import pandas as pd
bos = datasets.load_boston()
boston = pd.DataFrame(bos.data)
boston.columns = bos.feature_names
boston['MEDV'] = bos.target
boston.head()
[8]:
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18.0 | 2.31 | 0.0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1.0 | 296.0 | 15.3 | 396.90 | 4.98 | 24.0 |
1 | 0.02731 | 0.0 | 7.07 | 0.0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2.0 | 242.0 | 17.8 | 396.90 | 9.14 | 21.6 |
2 | 0.02729 | 0.0 | 7.07 | 0.0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2.0 | 242.0 | 17.8 | 392.83 | 4.03 | 34.7 |
3 | 0.03237 | 0.0 | 2.18 | 0.0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3.0 | 222.0 | 18.7 | 394.63 | 2.94 | 33.4 |
4 | 0.06905 | 0.0 | 2.18 | 0.0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3.0 | 222.0 | 18.7 | 396.90 | 5.33 | 36.2 |
Next, we load boston into Stata as the current dataset in memory and describe its contents. Here you can also use the API function pdataframe_to_data() of the stata module to load the pandas DataFrame into Stata. See Call Stata using API functions and Example 5 for more information.
[9]:
%%stata -d boston
describe
Contains data
Observations: 506
Variables: 14
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
CRIM double %10.0g
ZN double %10.0g
INDUS double %10.0g
CHAS double %10.0g
NOX double %10.0g
RM double %10.0g
AGE double %10.0g
DIS double %10.0g
RAD double %10.0g
TAX double %10.0g
PTRATIO double %10.0g
B double %10.0g
LSTAT double %10.0g
MEDV double %10.0g
-------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
Then, we fit a linear regression on the median value of owner-occupied homes, MEDV.
[10]:
%%stata
regress MEDV CRIM RM AGE
Source | SS df MS Number of obs = 506
-------------+---------------------------------- F(3, 502) = 216.13
Model | 24076.2612 3 8025.4204 Prob > F = 0.0000
Residual | 18640.0342 502 37.1315422 R-squared = 0.5636
-------------+---------------------------------- Adj R-squared = 0.5610
Total | 42716.2954 505 84.5867236 Root MSE = 6.0936
------------------------------------------------------------------------------
MEDV | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
CRIM | -.2110231 .0340656 -6.19 0.000 -.2779518 -.1440944
RM | 8.032838 .4020063 19.98 0.000 7.243016 8.82266
AGE | -.0522428 .0104628 -4.99 0.000 -.072799 -.0316866
_cons | -23.60556 2.76938 -8.52 0.000 -29.04656 -18.16456
------------------------------------------------------------------------------
You can also load multiple NumPy arrays or pandas DataFrames into Stata at once by specifying the -f argument. This will load each array or DataFrame into a separate frame in Stata. To demonstrate, we partition the boston DataFrame into two subsets, CHAS0 and CHAS1, based on the CHAS column.
[11]:
CHAS0 = boston[boston['CHAS']==0]
CHAS1 = boston[boston['CHAS']==1]
Then, we simultaneously load each DataFrame into a Stata frame with the same name.
[12]:
%%stata -f CHAS0,CHAS1
frames dir
* CHAS0 471 x 14
* CHAS1 35 x 14
* default 506 x 14
Note: Frames marked with * contain unsaved data.
Now we have the complete dataset and the two subsets all in memory.
The mata magic¶
The %%mata magic command is used to execute Mata code from within Python. For example, here we create the matrix X in Mata and then obtain its inverse, Xi. Then, we multiply Xi by the original matrix, X.
[13]:
%%mata
X = (76, 53, 48 \ 53, 88, 46 \ 48, 46, 63)
Xi = invsym(X)
Xi
Xi*X
. mata
------------------------------------------------- mata (type end to exit) -----
: X = (76, 53, 48 \ 53, 88, 46 \ 48, 46, 63)
: Xi = invsym(X)
: Xi
[symmetric]
1 2 3
+----------------------------------------------+
1 | .0298458083 |
2 | -.0098470272 .0216268926 |
3 | -.0155497706 -.0082885675 .0337724301 |
+----------------------------------------------+
: Xi*X
1 2 3
+----------------------------------------------+
1 | 1 -1.11022e-16 -1.11022e-16 |
2 | -1.11022e-16 1 0 |
3 | 0 0 1 |
+----------------------------------------------+
: end
-------------------------------------------------------------------------------
.
You can load one or more NumPy arrays into Mata as Mata matrices by specifying the -m argument; similarly, you can push one or more Mata matrices into Python as NumPy arrays by using the -outm argument.
Below, we generate a 3x3 NumPy array, npmat, and then calculate its inverse, storing it in npmat_inv.
[14]:
import numpy as np
np.random.seed(17)
npmat = np.random.random((3,3))
npmat_inv = np.linalg.inv(npmat)
[15]:
npmat, npmat_inv
[15]:
(array([[0.294665 , 0.53058676, 0.19152079],
[0.06790036, 0.78698546, 0.65633352],
[0.6375209 , 0.57560289, 0.03906292]]),
array([[-11.67027493, 3.01012075, 6.64203063],
[ 13.98144021, -3.71879942, -6.06620635],
[-15.55729514, 5.67128247, 6.58662066]]))
Next, we load the NumPy array npmat into Mata as a matrix and calculate its inverse using Mata. Then, we push the inverse matrix to Python as a NumPy array named mata_inv.
[16]:
%%mata -m npmat -outm mata_inv
npmat
mata_inv = luinv(npmat)
. mata
------------------------------------------------- mata (type end to exit) -----
: npmat
1 2 3
+-------------------------------------------+
1 | .2946650027 .5305867556 .1915207869 |
2 | .0679003582 .78698546 .6563335218 |
3 | .637520896 .5756028938 .0390629162 |
+-------------------------------------------+
: mata_inv = luinv(npmat)
: end
-------------------------------------------------------------------------------
.
Below, we display the mata_inv array. As you can see, this array is the same as the one we calculated in Python, npmat_inv.
[17]:
mata_inv
[17]:
array([[-11.67027493, 3.01012075, 6.64203063],
[ 13.98144021, -3.71879942, -6.06620635],
[-15.55729514, 5.67128247, 6.58662066]])