Call Stata using API functions

You can also interact with Stata by using the config and stata modules from the pystata Python package. The config module defines functions for initializing and configuring Stata. The stata module defines functions for interacting with Stata. For more information about these two modules, see API functions.

Previously, we initialized Stata’s environment within Python. Once we have done that, we can use the stata module to call Stata. Below, we import the module:

[18]:
from pystata import stata

The run() function is used to execute Stata commands. One or multiple Stata commands can be specified.

[19]:
stata.run('sysuse auto, clear')
(1978 automobile data)
[20]:
stata.run('''
summarize
reg mpg price i.foreign
ereturn list
''')

.
. summarize

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        make |          0
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
       rep78 |         69    3.405797    .9899323          1          5
    headroom |         74    2.993243    .8459948        1.5          5
-------------+---------------------------------------------------------
       trunk |         74    13.75676    4.277404          5         23
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
        turn |         74    39.64865    4.399354         31         51
displacement |         74    197.2973    91.83722         79        425
-------------+---------------------------------------------------------
  gear_ratio |         74    3.014865    .4562871       2.19       3.89
     foreign |         74    .2972973    .4601885          0          1

. reg mpg price i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     23.01
       Model |  960.866305         2  480.433152   Prob > F        =    0.0000
    Residual |  1482.59315        71  20.8815937   R-squared       =    0.3932
-------------+----------------------------------   Adj R-squared   =    0.3761
       Total |  2443.45946        73  33.4720474   Root MSE        =    4.5696

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   -.000959   .0001815    -5.28   0.000     -.001321    -.000597
             |
     foreign |
    Foreign  |   5.245271   1.163592     4.51   0.000     2.925135    7.565407
       _cons |   25.65058   1.271581    20.17   0.000     23.11512    28.18605
------------------------------------------------------------------------------

. ereturn list

scalars:
                  e(N) =  74
               e(df_m) =  2
               e(df_r) =  71
                  e(F) =  23.00749448574634
                 e(r2) =  .3932401256962295
               e(rmse) =  4.569638248831391
                e(mss) =  960.8663049714787
                e(rss) =  1482.593154487981
               e(r2_a) =  .3761482982510528
                 e(ll) =  -215.9083177127538
               e(ll_0) =  -234.3943376482347
               e(rank) =  3

macros:
            e(cmdline) : "regress mpg price i.foreign"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "mpg"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 4
                  e(V) :  4 x 4

functions:
             e(sample)

.

You can use the get_return(), get_ereturn(), and get_sreturn() functions to store Stata’s r(), e(), and s() results in Python as dictionaries.

[21]:
stata.get_ereturn()
[21]:
{'e(N)': 74.0,
 'e(df_m)': 2.0,
 'e(df_r)': 71.0,
 'e(F)': 23.007494485746342,
 'e(r2)': 0.39324012569622946,
 'e(rmse)': 4.569638248831391,
 'e(mss)': 960.8663049714787,
 'e(rss)': 1482.5931544879809,
 'e(r2_a)': 0.3761482982510528,
 'e(ll)': -215.90831771275379,
 'e(ll_0)': -234.39433764823468,
 'e(rank)': 3.0,
 'e(cmdline)': 'regress mpg price i.foreign',
 'e(title)': 'Linear regression',
 'e(marginsprop)': 'minus',
 'e(marginsok)': 'XB default',
 'e(vce)': 'ols',
 'e(depvar)': 'mpg',
 'e(cmd)': 'regress',
 'e(properties)': 'b V',
 'e(predict)': 'regres_p',
 'e(model)': 'ols',
 'e(estat_cmd)': 'regress_estat',
 'e(b)': array([[-9.59034169e-04,  0.00000000e+00,  5.24527100e+00,
          2.56505843e+01]]),
 'e(V)': array([[ 3.29592449e-08,  0.00000000e+00, -1.02918123e-05,
         -2.00142479e-04],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         -0.00000000e+00],
        [-1.02918123e-05,  0.00000000e+00,  1.35394617e+00,
         -3.39072871e-01],
        [-2.00142479e-04, -0.00000000e+00, -3.39072871e-01,
          1.61691892e+00]])}

You can also push Stata datasets to Python as NumPy arrays or pandas DataFrames. Below, we store Stata’s current dataset into a pandas DataFrame, myauto.

[22]:
myauto = stata.pdataframe_from_data()
myauto.head()
[22]:
make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
0 AMC Concord 4099 22 3.000000e+00 2.5 11 2930 186 40 121 3.58 0
1 AMC Pacer 4749 17 3.000000e+00 3.0 11 3350 173 40 258 2.53 0
2 AMC Spirit 3799 22 8.988466e+307 3.0 12 2640 168 35 121 3.08 0
3 Buick Century 4816 20 3.000000e+00 4.5 16 3250 196 40 196 2.93 0
4 Buick Electra 7827 15 4.000000e+00 4.0 20 4080 222 43 350 2.41 0

You can instead choose to store just a subset of the data in Python. Below, we store the first 10 observations of the variables mpg and price into a pandas DataFrame.

[23]:
stata.pdataframe_from_data('mpg price', range(10))
[23]:
mpg price
0 22 4099
1 17 4749
2 22 3799
3 20 4816
4 15 7827
5 18 5788
6 26 4453
7 20 5189
8 16 10372
9 19 4082

On the other hand, you can read data from Python into Stata, making it the current dataset or loading it into a specific frame in Stata. Below, we load the pandas DataFrame myauto into Stata, making it the current dataset. Then, we list the first three observations. Here force is specified as True to clear Stata’s memory before the DataFrame is loaded.

[24]:
stata.pdataframe_to_data(myauto, force=True)
stata.run('list in 1/3')

     +------------------------------------------------------------------------+
  1. |        make | price | mpg | rep78 | headroom | trunk | weight | length |
     | AMC Concord |  4099 |  22 |     3 |      2.5 |    11 |   2930 |    186 |
     |------------------------------------------------------------------------|
     |     turn     |     displa~t     |     gear_ra~o     |     foreign      |
     |       40     |          121     |     3.5799999     |           0      |
     +------------------------------------------------------------------------+

     +------------------------------------------------------------------------+
  2. |        make | price | mpg | rep78 | headroom | trunk | weight | length |
     |   AMC Pacer |  4749 |  17 |     3 |        3 |    11 |   3350 |    173 |
     |------------------------------------------------------------------------|
     |     turn     |     displa~t     |     gear_ra~o     |     foreign      |
     |       40     |          258     |          2.53     |           0      |
     +------------------------------------------------------------------------+

     +------------------------------------------------------------------------+
  3. |        make | price | mpg | rep78 | headroom | trunk | weight | length |
     |  AMC Spirit |  3799 |  22 |     . |        3 |    12 |   2640 |    168 |
     |------------------------------------------------------------------------|
     |     turn     |     displa~t     |     gear_ra~o     |     foreign      |
     |       35     |          121     |     3.0799999     |           0      |
     +------------------------------------------------------------------------+