Example 1: Basic usage

In this example, we will use the %%stata magic command to call Stata. Before getting started, we need to configure and import the pystata package to initialize Stata. See Configuration for more information on configuring the pystata package; below, we will be using the first method listed there to initialize Stata. In the first method, the configuration module stata_setup, which is available in the Python Package Index (PyPI), is provided to locate the pystata package to initialize Stata.

[1]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17/", "mp")

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      17.0
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2021 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        [email protected]

Stata license: 10-user 4-core network perpetual
Serial number: 1
  Licensed to: Stata Developer
               StataCorp LLC

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000; see help set_maxvar.

To illustrate calling Stata from Python, we use the German macroeconomic data discussed in Lütkepohl (2005). We are mainly interested in three variables: the first difference of the natural log of investment, dln_inv; the first difference of the natural log of income, dln_inc; and the first difference of the natural log of consumption, dln_consump. The values are recorded from the first quarter of 1960 through the fourth quarter of 1982.

First, we load the dataset, describe its contents, and display its time-series settings in Stata.

[2]:
%%stata
use https://www.stata-press.com/data/r17/lutkepohl2
describe
tsset

. use https://www.stata-press.com/data/r17/lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)

. describe

Contains data from https://www.stata-press.com/data/r17/lutkepohl2.dta
 Observations:            92                  Quarterly SA West German macro
                                                data, Bil DM, from Lutkepohl
                                                1993 Table E.1
    Variables:            10                  4 Dec 2020 14:31
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
inv             int     %8.0g                 Investment
inc             int     %8.0g                 Income
consump         int     %8.0g                 Consumption
qtr             byte    %tq                   Quarter
ln_inv          float   %9.0g                 Log investment
dln_inv         float   %9.0g                 First-difference of ln_inv
ln_inc          float   %9.0g                 Log income
dln_inc         float   %9.0g                 First-difference of ln_inc
ln_consump      float   %9.0g                 Log consumption
dln_consump     float   %9.0g                 First-difference of ln_consump
-------------------------------------------------------------------------------
Sorted by: qtr

. tsset

Time variable: qtr, 1960q1 to 1982q4
        Delta: 1 quarter

.

We then fit a vector autoregressive model with the var command.

[3]:
%%stata
var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk

Vector autoregression

Sample: 1960q4 thru 1978q4                      Number of obs     =         73
Log likelihood =    606.307                     AIC               =  -16.03581
FPE            =   2.18e-11                     HQIC              =  -15.77323
Det(Sigma_ml)  =   1.23e-11                     SBIC              =  -15.37691

Equation           Parms      RMSE     R-sq      chi2     P>chi2
----------------------------------------------------------------
dln_inv               7     .046148   0.1286   9.736909   0.1362
dln_inc               7     .011719   0.1142   8.508289   0.2032
dln_consump           7     .009445   0.2513   22.15096   0.0011
----------------------------------------------------------------

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
dln_inv      |
     dln_inv |
         L1. |  -.3196318   .1254564    -2.55   0.011    -.5655218   -.0737419
         L2. |  -.1605508   .1249066    -1.29   0.199    -.4053633    .0842616
             |
     dln_inc |
         L1. |   .1459851   .5456664     0.27   0.789    -.9235013    1.215472
         L2. |   .1146009   .5345709     0.21   0.830    -.9331388    1.162341
             |
 dln_consump |
         L1. |   .9612288   .6643086     1.45   0.148    -.3407922     2.26325
         L2. |   .9344001   .6650949     1.40   0.160     -.369162    2.237962
             |
       _cons |  -.0167221   .0172264    -0.97   0.332    -.0504852    .0170409
-------------+----------------------------------------------------------------
dln_inc      |
     dln_inv |
         L1. |   .0439309   .0318592     1.38   0.168     -.018512    .1063739
         L2. |   .0500302   .0317196     1.58   0.115    -.0121391    .1121995
             |
     dln_inc |
         L1. |  -.1527311   .1385702    -1.10   0.270    -.4243237    .1188615
         L2. |   .0191634   .1357525     0.14   0.888    -.2469067    .2852334
             |
 dln_consump |
         L1. |   .2884992    .168699     1.71   0.087    -.0421448    .6191431
         L2. |     -.0102   .1688987    -0.06   0.952    -.3412354    .3208353
             |
       _cons |   .0157672   .0043746     3.60   0.000     .0071932    .0243412
-------------+----------------------------------------------------------------
dln_consump  |
     dln_inv |
         L1. |   -.002423   .0256763    -0.09   0.925    -.0527476    .0479016
         L2. |   .0338806   .0255638     1.33   0.185    -.0162235    .0839847
             |
     dln_inc |
         L1. |   .2248134   .1116778     2.01   0.044      .005929    .4436978
         L2. |   .3549135   .1094069     3.24   0.001     .1404798    .5693471
             |
 dln_consump |
         L1. |  -.2639695   .1359595    -1.94   0.052    -.5304451    .0025062
         L2. |  -.0222264   .1361204    -0.16   0.870    -.2890175    .2445646
             |
       _cons |   .0129258   .0035256     3.67   0.000     .0060157    .0198358
------------------------------------------------------------------------------

Next, we estimate impulse–response functions and forecast-error variance decompositions and save them under the name order1 in myirf1. Then, we graph the orthogonalized impulse–response function, using dln_inc as the impulse variable and dln_consump as the response variable.

[4]:
%%stata
irf create order1, step(10) set(myirf1, replace)
irf graph oirf, impulse(dln_inc) response(dln_consump)

. irf create order1, step(10) set(myirf1, replace)
(file myirf1.irf created)
(file myirf1.irf now active)
(file myirf1.irf updated)

. irf graph oirf, impulse(dln_inc) response(dln_consump)

.
../_images/notebook_Example1_7_1.svg