Creating graphs with Stata¶

-- Hua Peng @ StataCorp

  • Creating graphs in Stata is easy.
  • Stata supports a wide variety of plots.
  • Stata graphic commands are highly customizable and extensible.

This presentation uses PyStata: see https://www.stata.com/python/pystata18/ for details.

In [1]:
import stata_setup
stata_setup.config('C:/Program Files/Stata18', 'mp')
  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      StataNow 18.5
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: 10-user 4-core network perpetual
Serial number: 1
  Licensed to: Stata Developer
               StataCorp LLC

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

graph command overview:¶

  • https://www.stata.com/help18.cgi?graph
  • https://www.stata.com/support/faqs/graphics/gph/stata-graphs/

graph twoway: scatter, line, bar, area, function, and histogram plots¶

In [2]:
%%stata -qui
// twoway scatter 
sysuse auto, clear
sc price mpg

No description has been provided for this image
In [3]:
%%stata -qui
// twoway line
sysuse sp500, clear
twoway line low date in 1/15

No description has been provided for this image
In [4]:
%%stata -qui
// twoway bar
sysuse sp500, clear
twoway bar change date in 1/27

No description has been provided for this image
In [5]:
%%stata -qui
// twoway area
sysuse sp500, clear
twoway area high date in 1/15

No description has been provided for this image
In [6]:
%%stata -qui
// twoway function
twoway function y=exp(-x/6)*sin(x), range(0 12.57)  ///
                yline(0, lstyle(foreground))  ///
                xlabel( 0                ///
                        3.14 "{&pi}"     ///
                        6.28 "2{&pi}"    ///
                        9.42 "3{&pi}"    ///
                       12.57 "4{&pi}")   ///
                plotregion(style(none))  ///
                xsca(noline) 

No description has been provided for this image
In [7]:
%%stata -qui
// twoway histogram
sysuse auto, clear
twoway histogram rep78, discrete

No description has been provided for this image

graph pie, graph bar, and graph histogram¶

In [8]:
%%stata -qui
// graph pie
sysuse auto, clear
graph pie price, over(rep78)

No description has been provided for this image

graph bar vs. graph twoway bar¶

  • graph twoway bar displays numeric (y, x) data as bars.
  • graph bar [(stat) y] [,over(x)] displays the stat of numerical variable y grouped by categorical variable x.
In [9]:
%%stata -qui
// graph bar, default stat is percent
sysuse auto, clear
graph bar, over(rep78) blabel(bar, format(%4.0f))

No description has been provided for this image
In [10]:
%%stata -qui
// graph bar, stat mean, min, and max 
sysuse auto, clear
graph bar (mean) price (max) weight (min) length, /// 
    over(foreign) ///
    blabel(bar, format(%4.2f)) ///
    legend(order(1 "Average price" 2 "Max weight" 3 "Min length"))

No description has been provided for this image

graph histogram vs. graph twoway histogram¶

  • graph histogram allows overlaying of a normal density or a kernel estimate of the density.
  • If a density estimate is overlaid, it scales the density to reflect the scaling of the bars.
In [11]:
%%stata -qui
// histogram
sysuse sp500, clear
histogram volume, freq normal                ///
    xaxis(1 2) ylabel(0(10)60, grid)         ///
    xlabel(12321 "mean" 9735 "-1 s.d."       ///
        14907 "+1 s.d." 7149 "-2 s.d."       ///
        17493 "+2 s.d." 20078 "+3 s.d."      ///
        22664 "+4 s.d.", axis(2) grid gmax)  xtitle("", axis(2)) ///
        subtitle("S&P 500, January 2001 - December 2001")        ///
        note("Source:Yahoo!Finance and Commodity Systems, Inc.")

No description has been provided for this image

graph command options¶

In [12]:
%%stata -qui
sysuse auto, clear
twoway  (scatter price mpg if foreign, mcolor(%80))   /// 
    (scatter price mpg if !foreign, mcolor(%20*1.2))      /// 
    (lfit price mpg, lcolor(gs2)),                    ///
    legend(order(2 "Foreign" 1 "Domestic") size(2.5)) ///
    title("{bf}Price vs. MPG", size(3))               ///
    subtitle("{it}with linear prediction", size(2.75))

No description has been provided for this image
In [13]:
%%stata -qui
sysuse auto, clear
label define repair 1 "Excellent" 2 "Good" 3 "Average" 4 "Fair" 5 "Poor"
label values rep78 repair 
gen int wgt2 = (weight / 1000) ^ 2
twoway (scatter price mpg [aw = wgt2],              ///
    colorvar(rep78) colordiscrete coloruseplegend   ///
    colorlist(stc1%20 stc2%20 stc3%20 stc4%20 stc5%20) zlabel(, valuelabel))  /// 
    (lfit price mpg, lcolor(red)), legend(off) plegend(size(2.5))             ///
    title("{bf:Price vs. MPG weighted by vehicle weight}{superscript:2}", size(3))    ///
    subtitle("{it}with linear prediction", size(2.75)) 

No description has been provided for this image

The previous example uses the new colorvar() option, see https://www.stata.com/new-in-stata/graph-colors-by-variable/ for details.

Change color, size, pattern, and other graph styles: https://www.stata.com/manuals/g-4colorstyle.pdf¶

In [14]:
%%stata -qui
sysuse auto, clear
twoway  (scatter price mpg if foreign, mcolor(red%20) msize(large))  /// 
    (scatter price mpg if !foreign, mcolor(blue*0.5) msize(small))    /// 
    (lfit price mpg, lcolor("255 128 0%20*0.5") lpattern(dash))          ///
    (lowess price mpg, lcolor("255 128 0*0.5") lwidth(thick))

No description has been provided for this image
  • Use graph query, color to get a list of named colors in Stata.
  • Use viewsource color-navy.style to get the rgb value of named color navy.
  • See https://www.stata.com/manuals/g-2graphquery.pdf for details about graph query command.

Change text size and text style: https://www.stata.com/bookstore/pdf/g_text.pdf¶

In [15]:
%%stata -qui
sysuse auto, clear
twoway  (scatter price mpg if foreign, mcolor(red) msize(large))  /// 
    (scatter price mpg if !foreign, mcolor(blue) msize(small))    /// 
    (lfit price mpg, lcolor("255 128 0") lpattern(dash))          ///
    (lowess price mpg, lcolor("255 128 0 * 0.5") lwidth(thick))   ///
    , title("{bf:Price vs. MPG}", size(medlarge))                 ///
    subtitle("{it:with linear prediction and lowess}", size(2.75)) ///
    note("1978 Automobile Data")

No description has been provided for this image

Change axis styles: https://www.stata.com/manuals/g-3axis_options.pdf¶

In [16]:
%%stata -qui
sysuse sp500, clear
tsset date
// note that the right most xlabel 01jan2002 
// is over the plot region boundary 
tsline open

No description has been provided for this image
In [17]:
%%stata -qui
sysuse sp500, clear
tsset date
// change xlabel orientation
tsline open, xlabel(, angle(90))

No description has been provided for this image
In [18]:
%%stata -qui
sysuse sp500, clear
tsset date
// extend right margin
tsline open, plotr(margin(r+3))

No description has been provided for this image

Change legend styles: https://www.stata.com/manuals/g-3legend_options.pdf¶

In [19]:
%%stata -qui
sysuse auto, clear
twoway  (scatter price mpg if foreign, mcolor(red) msize(large))    /// 
    (scatter price mpg if !foreign, mcolor(blue) msize(small))      /// 
    (lfit price mpg, lcolor("255 128 0") lpattern(dash))            ///
    (lowess price mpg, lcolor("255 128 0 * 0.5") lwidth(thick))     ///
    , legend(label(1 "Foreign") label(2 "Domestic")                 /// 
        label(3 "{it:linear prediction}") label(4 "{bf:lowess}")    ///
        order(3 4 1 2))                                             /// 
    title("{bf:Price vs. MPG}", size(medlarge))                     ///
    subtitle("{it}with linear prediction and lowess", size(2.75))   ///
    note("1978 Automobile Data")

No description has been provided for this image

Combine graphs¶

  • Combine twoway plots using overlay.
  • Create multiple subplots using by.
  • Combine graphs using graph combine.
In [20]:
%%stata -qui
// overlay twoway plots
sysuse auto, clear
twoway (scatter price mpg) (scatter weight mpg) 

No description has been provided for this image
In [21]:
%%stata -qui
// overlay twoway plots with different types
sysuse auto
twoway (scatter price mpg) (line weight mpg, sort)

No description has been provided for this image
In [22]:
%%stata -qui
// overlay variables with different scales
sysuse auto, clear
twoway (scatter price mpg, yaxis(1)) (scatter weight mpg, yaxis(2)), legend(pos(6) rows(1)) 

No description has been provided for this image
In [23]:
%%stata -qui
// subplots with by
sysuse auto, clear
twoway scatter mpg price, by(rep78, cols(3) total)

No description has been provided for this image
In [24]:
%%stata -qui
// graph combine
sysuse auto, clear
twoway scatter price mpg, nodraw name(t1, replace)
twoway scatter weight length, nodraw name(t2, replace)
graph combine t1 t2

No description has been provided for this image
In [25]:
%%stata -qui
// graph combine, alternate axes, and axis label length
sysuse lifeexp, clear
gen loggnp = log10(gnppc)
label var loggnp "Log{subscript:10} of GNP per capita"
scatter lexp loggnp, ysca(alt) xsca(alt) ylabel(, nogrid  labelminlen(3)) /// 
    xlabel(, grid gmax) name(yx, replace) nodraw 
twoway histogram lexp, fraction xsca(alt reverse) horiz  /// 
    fxsize(25) name(hy, replace) nodraw
twoway histogram loggnp, fraction ysca(alt reverse)   /// 
    ylabel(0(.1).2, nogrid labelminlen(3)) xlabel(,grid gmax)  ///
    fysize(25) name(hx, replace) nodraw
graph combine hy yx hx, hole(3)  /// 
    imargin(0 0 0 0) graphregion(margin(l=22 r=22)) /// 
    title("Life expectancy at birth vs. GNP per capita") /// 
    note("Source: 1998 data from The World Bank Group")

No description has been provided for this image

The previous example uses the new labelminlen option, which sets the minimum width of axis label to align the two y-axes. See https://www.stata.com/manuals/g-3axis_label_options.pdf for details.

Use scheme to define the overall look of a graph¶

In [26]:
%%stata -qui
// default scheme stcolor
sysuse sp500, clear
twoway scatter high low date      ||    ///
         line    high low date,   ||    ///
         rarea   high low date, color(gray%20)

No description has been provided for this image
In [27]:
%%stata -qui
sysuse sp500, clear
twoway scatter high low date      ||    ///
        line    high low date,   ||    ///
        rarea   high low date, color(gray%20) /// 
        scheme(stmono2)

No description has been provided for this image
In [28]:
%%stata -qui
// require user-written schemepack 
// https://github.com/asjadnaqvi/stata-schemepack
sysuse sp500, clear
twoway scatter high low date      ||    ///
         line    high low date,   ||                ///
         rarea   high low date, color(gray%20)      ///
         scheme(gg_tableau)

No description has been provided for this image

See https://www.stata.com/stata-news/news33-4/spotlight/ for more details.

Save graphs: https://www.stata.com/help18.cgi?graph_save¶

  • gph live - the graph can be edited in future sessions, and the look of the graph can be changed by the scheme.
  • gph asis - specifies that the graph be frozen and saved as is.

Export graphs: https://www.stata.com/help18.cgi?graph_export¶

  • svg, pdf, and eps - vector image formats
  • png, jpg, etc. - raster image formats

Graph Editor¶

  • https://www.stata.com/features/overview/graph-editor/
  • https://www.stata.com/support/faqs/graphics/graph-recorder/
  • https://www.stata.com/stata-news/news39-3/community-corner-graph-editor/

Beyond the graph command: marginsplot, survival, time series, etc.¶

In [29]:
%%stata -qui

webuse nhanes2l, clear

// fit a linear regression model using 
// the continuous outcome variable bpsystol, 
// the binary predictor variable diabetes, 
// and the categorical predictor variable hlthstat.

regress bpsystol i.hlthstat##i.diabetes

// estimate marginal predictions of SBP for each 
// combination of the categories of hlthstat and diabetes.

margins diabetes#hlthstat

// profile plot shows a separate line for each category of hlthstat
marginsplot

No description has been provided for this image
In [30]:
%%stata -qui

webuse nhanes2l, clear
regress bpsystol i.hlthstat##i.diabetes
margins diabetes#hlthstat

//  if we prefer a horizontal bar chart
marginsplot, recast(bar) xdimension(hlthstat diabetes) horizontal

No description has been provided for this image
In [31]:
%%stata -qui
// survival plot
webuse stan3, clear
sts graph, by(posttran)

No description has been provided for this image
In [32]:
%%stata -qui
// time-series plot
sysuse sp500, clear
tsset date
tsrline high low, plotregion(margin(r+3))

No description has been provided for this image
In [33]:
%%stata -qui
// power and sample-size plot
// For average score 520 with a standard deviation of 135, we want to see the 
// power obtained for sample sizes of 100 through 500  when scores increase 
// by 20, 40, 60, and 80 points or, equivalently, when average scores 
// increase to 540, 560, 580, and 600.
power twomeans 520 (540 560 580 600), n(100 200 300 400 500) sd(135) graph

No description has been provided for this image

Community-contributed commands¶

catplot¶

In [34]:
%%stata -qui
** ssc install catplot
sysuse auto, clear
catplot rep78 foreign, percent(foreign)                /// 
    bar(1, bcolor(%40)) blabel(bar, position(outside)  /// 
    format(%3.1f)) ylabel(none) yscale(r(0,60))

No description has been provided for this image

coefplot¶

In [35]:
%%stata -qui
** https://repec.sowi.unibe.ch/stata/coefplot
** ssc install coefplot
sysuse auto, clear
regress price mpg trunk if !foreign
estimates store domestic
regress price mpg trunk if foreign
estimates store foreign
coefplot domestic foreign, drop(_cons) xline(0)

No description has been provided for this image

grmap¶

In [36]:
%%stata -qui
// copy grmap sample data
capture grmap_copy
use italy-outlinedata.dta, clear
grmap, title("Provincial capitals" " ", size(*0.9))   ///
  point(data("italy-capitals.dta") xcoord(xcoord) ycoord(ycoord))

No description has been provided for this image

geoplot : https://github.com/benjann/geoplot¶

In [37]:
%%stata -qui
** Stata version 16.1 or newer
** ssc install: geoplot palettes colrspace moremata
clear all
// get data
local url http://fmwww.bc.edu/repec/bocode/i/
geoframe create regions  `url'Italy-RegionsData.dta, id(id) coord(xcoord ycoord) ///
    shp(Italy-RegionsCoordinates.dta)
geoframe create country  `url'Italy-OutlineCoordinates.dta
geoframe create capitals `url'Italy-Capitals.dta, coord(xcoord ycoord)
geoframe create lakes    `url'Italy-Lakes.dta, feature(water)
geoframe create rivers   `url'Italy-Rivers.dta, feature(water)
geoplot (area regions) ///
    (point capitals i.size [w=pop98], color(Set1%50) mlcolor(%0)) ///
    (label capitals city if pop98>250000, color(black)) ///
    , legend compass sbar(length(300) units(km))

No description has been provided for this image

sankey plot: https://github.com/asjadnaqvi/The-Stata-Guide?tab=readme-ov-file¶

In [38]:
%%stata -qui
** https://github.com/asjadnaqvi/stata-sankey
** ssc install sankey, replace
** ssc install palettes, replace
** ssc install colrspace, replace

import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_example2.xlsx?raw=true", clear first
sankey value, from(source) to(destination) by(layer)

No description has been provided for this image

Thanks!¶