Creating graphs with Stata¶
-- Hua Peng @ StataCorp
- Creating graphs in Stata is easy.
- Stata supports a wide variety of plots.
- Stata graphic commands are highly customizable and extensible.
This presentation uses PyStata: see https://www.stata.com/python/pystata18/ for details.
In [1]:
import stata_setup
stata_setup.config('C:/Program Files/Stata18', 'mp')
___ ____ ____ ____ ____ ® /__ / ____/ / ____/ StataNow 18.5 ___/ / /___/ / /___/ MP—Parallel Edition Statistics and Data Science Copyright 1985-2023 StataCorp LLC StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-782-8272 https://www.stata.com 979-696-4600 service@stata.com Stata license: 10-user 4-core network perpetual Serial number: 1 Licensed to: Stata Developer StataCorp LLC Notes: 1. Unicode is supported; see help unicode_advice. 2. More than 2 billion observations are allowed; see help obs_advice. 3. Maximum number of variables is set to 5,000 but can be increased; see help set_maxvar.
graph command overview:¶
graph twoway: scatter, line, bar, area, function, and histogram plots¶
In [2]:
%%stata -qui
// twoway scatter
sysuse auto, clear
sc price mpg
In [3]:
%%stata -qui
// twoway line
sysuse sp500, clear
twoway line low date in 1/15
In [4]:
%%stata -qui
// twoway bar
sysuse sp500, clear
twoway bar change date in 1/27
In [5]:
%%stata -qui
// twoway area
sysuse sp500, clear
twoway area high date in 1/15
In [6]:
%%stata -qui
// twoway function
twoway function y=exp(-x/6)*sin(x), range(0 12.57) ///
yline(0, lstyle(foreground)) ///
xlabel( 0 ///
3.14 "{&pi}" ///
6.28 "2{&pi}" ///
9.42 "3{&pi}" ///
12.57 "4{&pi}") ///
plotregion(style(none)) ///
xsca(noline)
In [7]:
%%stata -qui
// twoway histogram
sysuse auto, clear
twoway histogram rep78, discrete
graph pie, graph bar, and graph histogram¶
In [8]:
%%stata -qui
// graph pie
sysuse auto, clear
graph pie price, over(rep78)
graph bar vs. graph twoway bar¶
- graph twoway bar displays numeric (y, x) data as bars.
- graph bar [(stat) y] [,over(x)] displays the stat of numerical variable y grouped by categorical variable x.
In [9]:
%%stata -qui
// graph bar, default stat is percent
sysuse auto, clear
graph bar, over(rep78) blabel(bar, format(%4.0f))
In [10]:
%%stata -qui
// graph bar, stat mean, min, and max
sysuse auto, clear
graph bar (mean) price (max) weight (min) length, ///
over(foreign) ///
blabel(bar, format(%4.2f)) ///
legend(order(1 "Average price" 2 "Max weight" 3 "Min length"))
graph histogram vs. graph twoway histogram¶
- graph histogram allows overlaying of a normal density or a kernel estimate of the density.
- If a density estimate is overlaid, it scales the density to reflect the scaling of the bars.
In [11]:
%%stata -qui
// histogram
sysuse sp500, clear
histogram volume, freq normal ///
xaxis(1 2) ylabel(0(10)60, grid) ///
xlabel(12321 "mean" 9735 "-1 s.d." ///
14907 "+1 s.d." 7149 "-2 s.d." ///
17493 "+2 s.d." 20078 "+3 s.d." ///
22664 "+4 s.d.", axis(2) grid gmax) xtitle("", axis(2)) ///
subtitle("S&P 500, January 2001 - December 2001") ///
note("Source:Yahoo!Finance and Commodity Systems, Inc.")
graph command options¶
In [12]:
%%stata -qui
sysuse auto, clear
twoway (scatter price mpg if foreign, mcolor(%80)) ///
(scatter price mpg if !foreign, mcolor(%20*1.2)) ///
(lfit price mpg, lcolor(gs2)), ///
legend(order(2 "Foreign" 1 "Domestic") size(2.5)) ///
title("{bf}Price vs. MPG", size(3)) ///
subtitle("{it}with linear prediction", size(2.75))
In [13]:
%%stata -qui
sysuse auto, clear
label define repair 1 "Excellent" 2 "Good" 3 "Average" 4 "Fair" 5 "Poor"
label values rep78 repair
gen int wgt2 = (weight / 1000) ^ 2
twoway (scatter price mpg [aw = wgt2], ///
colorvar(rep78) colordiscrete coloruseplegend ///
colorlist(stc1%20 stc2%20 stc3%20 stc4%20 stc5%20) zlabel(, valuelabel)) ///
(lfit price mpg, lcolor(red)), legend(off) plegend(size(2.5)) ///
title("{bf:Price vs. MPG weighted by vehicle weight}{superscript:2}", size(3)) ///
subtitle("{it}with linear prediction", size(2.75))
The previous example uses the new colorvar() option, see https://www.stata.com/new-in-stata/graph-colors-by-variable/ for details.
Change color, size, pattern, and other graph styles: https://www.stata.com/manuals/g-4colorstyle.pdf¶
In [14]:
%%stata -qui
sysuse auto, clear
twoway (scatter price mpg if foreign, mcolor(red%20) msize(large)) ///
(scatter price mpg if !foreign, mcolor(blue*0.5) msize(small)) ///
(lfit price mpg, lcolor("255 128 0%20*0.5") lpattern(dash)) ///
(lowess price mpg, lcolor("255 128 0*0.5") lwidth(thick))
- Use
graph query, color
to get a list of named colors in Stata. - Use
viewsource color-navy.style
to get the rgb value of named color navy. - See https://www.stata.com/manuals/g-2graphquery.pdf for details about
graph query
command.
Change text size and text style: https://www.stata.com/bookstore/pdf/g_text.pdf¶
In [15]:
%%stata -qui
sysuse auto, clear
twoway (scatter price mpg if foreign, mcolor(red) msize(large)) ///
(scatter price mpg if !foreign, mcolor(blue) msize(small)) ///
(lfit price mpg, lcolor("255 128 0") lpattern(dash)) ///
(lowess price mpg, lcolor("255 128 0 * 0.5") lwidth(thick)) ///
, title("{bf:Price vs. MPG}", size(medlarge)) ///
subtitle("{it:with linear prediction and lowess}", size(2.75)) ///
note("1978 Automobile Data")
Change axis styles: https://www.stata.com/manuals/g-3axis_options.pdf¶
In [16]:
%%stata -qui
sysuse sp500, clear
tsset date
// note that the right most xlabel 01jan2002
// is over the plot region boundary
tsline open
In [17]:
%%stata -qui
sysuse sp500, clear
tsset date
// change xlabel orientation
tsline open, xlabel(, angle(90))
In [18]:
%%stata -qui
sysuse sp500, clear
tsset date
// extend right margin
tsline open, plotr(margin(r+3))
Change legend styles: https://www.stata.com/manuals/g-3legend_options.pdf¶
In [19]:
%%stata -qui
sysuse auto, clear
twoway (scatter price mpg if foreign, mcolor(red) msize(large)) ///
(scatter price mpg if !foreign, mcolor(blue) msize(small)) ///
(lfit price mpg, lcolor("255 128 0") lpattern(dash)) ///
(lowess price mpg, lcolor("255 128 0 * 0.5") lwidth(thick)) ///
, legend(label(1 "Foreign") label(2 "Domestic") ///
label(3 "{it:linear prediction}") label(4 "{bf:lowess}") ///
order(3 4 1 2)) ///
title("{bf:Price vs. MPG}", size(medlarge)) ///
subtitle("{it}with linear prediction and lowess", size(2.75)) ///
note("1978 Automobile Data")
Combine graphs¶
- Combine twoway plots using overlay.
- Create multiple subplots using by.
- Combine graphs using graph combine.
In [20]:
%%stata -qui
// overlay twoway plots
sysuse auto, clear
twoway (scatter price mpg) (scatter weight mpg)
In [21]:
%%stata -qui
// overlay twoway plots with different types
sysuse auto
twoway (scatter price mpg) (line weight mpg, sort)
In [22]:
%%stata -qui
// overlay variables with different scales
sysuse auto, clear
twoway (scatter price mpg, yaxis(1)) (scatter weight mpg, yaxis(2)), legend(pos(6) rows(1))
In [23]:
%%stata -qui
// subplots with by
sysuse auto, clear
twoway scatter mpg price, by(rep78, cols(3) total)
In [24]:
%%stata -qui
// graph combine
sysuse auto, clear
twoway scatter price mpg, nodraw name(t1, replace)
twoway scatter weight length, nodraw name(t2, replace)
graph combine t1 t2
In [25]:
%%stata -qui
// graph combine, alternate axes, and axis label length
sysuse lifeexp, clear
gen loggnp = log10(gnppc)
label var loggnp "Log{subscript:10} of GNP per capita"
scatter lexp loggnp, ysca(alt) xsca(alt) ylabel(, nogrid labelminlen(3)) ///
xlabel(, grid gmax) name(yx, replace) nodraw
twoway histogram lexp, fraction xsca(alt reverse) horiz ///
fxsize(25) name(hy, replace) nodraw
twoway histogram loggnp, fraction ysca(alt reverse) ///
ylabel(0(.1).2, nogrid labelminlen(3)) xlabel(,grid gmax) ///
fysize(25) name(hx, replace) nodraw
graph combine hy yx hx, hole(3) ///
imargin(0 0 0 0) graphregion(margin(l=22 r=22)) ///
title("Life expectancy at birth vs. GNP per capita") ///
note("Source: 1998 data from The World Bank Group")
The previous example uses the new labelminlen option, which sets the minimum width of axis label to align the two y-axes. See https://www.stata.com/manuals/g-3axis_label_options.pdf for details.
Use scheme to define the overall look of a graph¶
In [26]:
%%stata -qui
// default scheme stcolor
sysuse sp500, clear
twoway scatter high low date || ///
line high low date, || ///
rarea high low date, color(gray%20)
In [27]:
%%stata -qui
sysuse sp500, clear
twoway scatter high low date || ///
line high low date, || ///
rarea high low date, color(gray%20) ///
scheme(stmono2)
In [28]:
%%stata -qui
// require user-written schemepack
// https://github.com/asjadnaqvi/stata-schemepack
sysuse sp500, clear
twoway scatter high low date || ///
line high low date, || ///
rarea high low date, color(gray%20) ///
scheme(gg_tableau)
See https://www.stata.com/stata-news/news33-4/spotlight/ for more details.
Save graphs: https://www.stata.com/help18.cgi?graph_save¶
- gph live - the graph can be edited in future sessions, and the look of the graph can be changed by the scheme.
- gph asis - specifies that the graph be frozen and saved as is.
Export graphs: https://www.stata.com/help18.cgi?graph_export¶
- svg, pdf, and eps - vector image formats
- png, jpg, etc. - raster image formats
Beyond the graph command: marginsplot, survival, time series, etc.¶
In [29]:
%%stata -qui
webuse nhanes2l, clear
// fit a linear regression model using
// the continuous outcome variable bpsystol,
// the binary predictor variable diabetes,
// and the categorical predictor variable hlthstat.
regress bpsystol i.hlthstat##i.diabetes
// estimate marginal predictions of SBP for each
// combination of the categories of hlthstat and diabetes.
margins diabetes#hlthstat
// profile plot shows a separate line for each category of hlthstat
marginsplot
In [30]:
%%stata -qui
webuse nhanes2l, clear
regress bpsystol i.hlthstat##i.diabetes
margins diabetes#hlthstat
// if we prefer a horizontal bar chart
marginsplot, recast(bar) xdimension(hlthstat diabetes) horizontal
In [31]:
%%stata -qui
// survival plot
webuse stan3, clear
sts graph, by(posttran)
In [32]:
%%stata -qui
// time-series plot
sysuse sp500, clear
tsset date
tsrline high low, plotregion(margin(r+3))
In [33]:
%%stata -qui
// power and sample-size plot
// For average score 520 with a standard deviation of 135, we want to see the
// power obtained for sample sizes of 100 through 500 when scores increase
// by 20, 40, 60, and 80 points or, equivalently, when average scores
// increase to 540, 560, 580, and 600.
power twomeans 520 (540 560 580 600), n(100 200 300 400 500) sd(135) graph
Community-contributed commands¶
catplot¶
In [34]:
%%stata -qui
** ssc install catplot
sysuse auto, clear
catplot rep78 foreign, percent(foreign) ///
bar(1, bcolor(%40)) blabel(bar, position(outside) ///
format(%3.1f)) ylabel(none) yscale(r(0,60))
coefplot¶
In [35]:
%%stata -qui
** https://repec.sowi.unibe.ch/stata/coefplot
** ssc install coefplot
sysuse auto, clear
regress price mpg trunk if !foreign
estimates store domestic
regress price mpg trunk if foreign
estimates store foreign
coefplot domestic foreign, drop(_cons) xline(0)
grmap¶
In [36]:
%%stata -qui
// copy grmap sample data
capture grmap_copy
use italy-outlinedata.dta, clear
grmap, title("Provincial capitals" " ", size(*0.9)) ///
point(data("italy-capitals.dta") xcoord(xcoord) ycoord(ycoord))
geoplot : https://github.com/benjann/geoplot¶
In [37]:
%%stata -qui
** Stata version 16.1 or newer
** ssc install: geoplot palettes colrspace moremata
clear all
// get data
local url http://fmwww.bc.edu/repec/bocode/i/
geoframe create regions `url'Italy-RegionsData.dta, id(id) coord(xcoord ycoord) ///
shp(Italy-RegionsCoordinates.dta)
geoframe create country `url'Italy-OutlineCoordinates.dta
geoframe create capitals `url'Italy-Capitals.dta, coord(xcoord ycoord)
geoframe create lakes `url'Italy-Lakes.dta, feature(water)
geoframe create rivers `url'Italy-Rivers.dta, feature(water)
geoplot (area regions) ///
(point capitals i.size [w=pop98], color(Set1%50) mlcolor(%0)) ///
(label capitals city if pop98>250000, color(black)) ///
, legend compass sbar(length(300) units(km))
In [38]:
%%stata -qui
** https://github.com/asjadnaqvi/stata-sankey
** ssc install sankey, replace
** ssc install palettes, replace
** ssc install colrspace, replace
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_example2.xlsx?raw=true", clear first
sankey value, from(source) to(destination) by(layer)