Home  /  Products  /  Features  /  PyStata—Python and Stata

<-  See Stata's other features

Highlights

  • Use Stata from within Python

  • Stata API functions to run Stata commands and access Stata data and returned results from Python

  • IPython magic command to use Stata from Jupyter Notebook

  • See more programming features

PyStata allows you to invoke Stata directly from any standalone Python environment and to call Python directly from Stata, thus, greatly expanding Stata's Python integration features.

Features in PyStata include:

  1. The ability to use Stata from an IPython kernel-based environment like Jupyter Notebook, Spyder IDE, or PyCharm IDE

  2. The ability to use Stata from Python Shell, like the Windows Command Prompt, the macOS terminal, or the Unix terminal

  3. Four IPython (interactive Python) magic commands: stata, mata, pystata, and help

  4. A suite of API functions from within Python to run Stata commands and access Stata data and returned results

These tools, together with the Stata Function Interface (sfi) module, allow users to easily integrate Stata's vast statistical and data management methods into any data science project using Python.

Let's see it work

Imagine that a health provider is interested in studying the effect of a new hospital admissions procedure on patient satisfaction. They have monthly data on patients before and after the new procedure was implemented in some of their hospitals. The data are in nested JSON format, and the health provider uses Python as the data analysis tool. But they would like to use Stata's new DID regression to analyze the effect of the new admissions procedure on the hospitals that participated in the program. The outcome of interest is patient satisfaction, satisfaction_score, and the treatment variable is procedure.

A portion of did.json is

{
    "hospital_id": "1",
    "month": "7",
    "records": [
        {
            "procedure": "New",
            "satisfaction_score": "4.1065269"
        }
    ]
}

We use the API function in a Python script, did.py, to interact with Stata. Some highlights of the code are

# Setup Stata
import stata_setup
stata_setup.config("C:/Program Files/Stata19", "se")

# Import json data
from pandas.io.json import json_normalize
import json
with open("did.json") as json_file:
    data = json.load(json_file)
data = json_normalize(data, 'records', ['hospital_id', 'month'])

# Load data to Stata
from pystata import stata
stata.pdataframe_to_data(data, True)

# Run block of Stata code
stata.run('''
destring satisfaction_score, replace
destring hospital_id, replace
destring month, replace

gen proc = 0
replace proc = 1 if procedure == "New"
label define procedure 0 "Old" 1 "New"
drop procedure
rename proc procedure
label value procedure procedure
''', quietly=True)

stata.run('''
didregress (satisfaction_score) (procedure), group(hospital_id) time(month)
''', echo=True)

# Load Stata results to Python
r = stata.get_return()['r(table)']

# Use Stata results in Python
print("The treatment hospitals had a %4.2f-point increase."
      % (r[0][0]), end=" ")
print("The result is with 95%% confidence interval [%4.2f, %4.2f]."
      % (r[4][0], r[5][0]))

# Generate Stata graph
stata.run("estat trendplots", quietly=True)
stata.run("graph export did.svg, replace", quietly=True)

Run did.py in Spyder

Here we run did.py, which was created in the above section, in Spyder.

did_spyder.png

The entire analysis is performed without leaving the Python environment. And with Stata's API functions, data and results flow seamlessly between Python and Stata.

Run did.py in Python Shell

The script can easily be executed in any Python environment, such as the Windows Command Prompt, the macOS terminal, or the Unix terminal. This method uses only the shell environment and does not invoke any GUI element of Stata.

python did.py > did.log

produces a log file, did.log, with output from didregress.

This method is useful for automating tasks in Windows. And the above script can be incorporated into a regularly scheduled task to handle new data.

For a detailed example using Stata in Jupyter Notebook or any Python environment that supports IPython, see Jupyter Notebook with Stata.

Tell me more

Learn more about using Python and Stata together.