We are going to analyze unemployment rate in counties of Texas. We are going to use texas_ue.dta. The data contains unemployment rate and college graduation rate for Texas counties, but they do not include locations of the counties. We are going to
From any web browser, we search “shapefile U.S counties census” and found https://www.census.gov/geo/maps-data/data/tiger-line.html .
File tl_2017_us_county.zip is downloaded to Downloads directory on our computer.
We now use unzipfile and spshape2dta to translate tl_2017_us_county.zip into Stata format.
. /*
> Step 1 : move the download file to the working directory
> */
. copy ~/Downloads/tl_2017_us_county.zip .
.
. /*
> Step 2 : unzip the files
> */
. unzipfile tl_2017_us_county.zip
inflating: tl_2017_us_county.cpg
inflating: tl_2017_us_county.dbf
inflating: tl_2017_us_county.prj
inflating: tl_2017_us_county.shp
inflating: tl_2017_us_county.shp.ea.iso.xml
inflating: tl_2017_us_county.shp.iso.xml
inflating: tl_2017_us_county.shp.xml
inflating: tl_2017_us_county.shx
successfully unzipped tl_2017_us_county.zip to current directory
total processed: 8
skipped: 0
extracted: 8
.
. /*
> Step 3 : translate shapefile to Stata
> */
. spshape2dta tl_2017_us_county
(importing .shp file)
(importing .dbf file)
(creating _ID spatial-unit id)
(creating _CX coordinate)
(creating _CY coordinate)
file tl_2017_us_county_shp.dta created
file tl_2017_us_county.dta created
.
. use tl_2017_us_county, clear
. describe
Contains data from tl_2017_us_county.dta
obs: 3,233
vars: 20 1 Mar 2018 11:02
size: 491,416
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_ID int %12.0g Spatial-unit ID
_CX double %10.0g x-coordinate of area centroid
_CY double %10.0g y-coordinate of area centroid
STATEFP str2 %9s STATEFP
COUNTYFP str3 %9s COUNTYFP
COUNTYNS str8 %9s COUNTYNS
GEOID str5 %9s GEOID
NAME str21 %21s NAME
NAMELSAD str33 %33s NAMELSAD
LSAD str2 %9s LSAD
CLASSFP str2 %9s CLASSFP
MTFCC str5 %9s MTFCC
CSAFP str3 %9s CSAFP
CBSAFP str5 %9s CBSAFP
METDIVFP str5 %9s METDIVFP
FUNCSTAT str1 %9s FUNCSTAT
ALAND double %14.0f ALAND
AWATER double %14.0f AWATER
INTPTLAT str11 %11s INTPTLAT
INTPTLON str12 %12s INTPTLON
-------------------------------------------------------------------------------
Sorted by: _ID
. list _ID _CX _CY STATEFP COUNTYFP in 1/5
+---------------------------------------------------+
| _ID _CX _CY STATEFP COUNTYFP |
|---------------------------------------------------|
1. | 1 -96.7874 41.916403 31 039 |
2. | 2 -123.43347 46.291134 53 069 |
3. | 3 -104.41196 34.342414 35 011 |
4. | 4 -96.687756 40.784174 31 109 |
5. | 5 -98.047185 40.17638 31 129 |
+---------------------------------------------------+
.
. /*
> Step 4 : create standard ID variable
> */
. generate long fips = real(STATEFP + COUNTYFP)
. bysort fips : assert _N == 1
. assert fips != .
.
. /*
> Step 5 : tell Sp to use standard ID variable
> */
. spset fips, modify replace
(_shp.dta file saved)
(data in memory saved)
Sp dataset tl_2017_us_county.dta
data: cross sectional
spatial-unit id: _ID (equal to fips)
coordinates: _CX, _CY (planar)
linked shapefile: tl_2017_us_county_shp.dta
.
. /*
> Step 6 : Set coordinates units
> */
. spset, modify coordsys(latlong, miles)
Sp dataset tl_2017_us_county.dta
data: cross sectional
spatial-unit id: _ID (equal to fips)
coordinates: _CY, _CX (latitude-and-longitude, miles)
linked shapefile: tl_2017_us_county_shp.dta
Recall that we are going to use texas_ue containing unemployment rate and college graduation rate for Texas counties.
. copy http://www.stata-press.com/data/r15/texas_ue.dta .
. use texas_ue, clear
. describe
Contains data from texas_ue.dta
obs: 254
vars: 4 10 Feb 2017 12:36
size: 4,064 (_dta has notes)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
fips long %9.0g FIPS
college float %9.0g * Percent college degree
income long %12.0g Median household income
unemployment float %9.0g Unemployment rate
* indicated variables have notes
-------------------------------------------------------------------------------
Sorted by: fips
Note: Dataset has changed since last saved.
.
. /*
> merge the translated shapefile
> */
. merge 1:1 fips using tl_2017_us_county
Result # of obs.
-----------------------------------------
not matched 2,979
from master 0 (_merge==1)
from using 2,979 (_merge==2)
matched 254 (_merge==3)
-----------------------------------------
. keep if _merge == 3
(2,979 observations deleted)
. drop _merge
.
. save texas_ue, replace
file texas_ue.dta saved
. use texas_ue, clear
. /*
> Step 1 : Is there spatial spillover ?
> */
. regress unemployment college
Source | SS df MS Number of obs = 254
-------------+---------------------------------- F(1, 252) = 57.92
Model | 139.314746 1 139.314746 Prob > F = 0.0000
Residual | 606.129539 252 2.40527595 R-squared = 0.1869
-------------+---------------------------------- Adj R-squared = 0.1837
Total | 745.444285 253 2.9464201 Root MSE = 1.5509
------------------------------------------------------------------------------
unemployment | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
college | -.1008791 .0132552 -7.61 0.000 -.1269842 -.0747741
_cons | 6.542796 .2571722 25.44 0.000 6.036316 7.049277
------------------------------------------------------------------------------
. spmatrix create contiguity W, replace
. estat moran, errorlag(W)
Moran test for spatial dependence
Ho: error is i.i.d.
Errorlags: W
chi2(1) = 94.06
Prob > chi2 = 0.0000
. /*
> Step 2 : estimation with spregress
> */
. spregress unemployment college, dvarlag(W) gs2sls
(254 observations)
(254 observations (places) used)
(weighting matrix defines 254 places)
Spatial autoregressive model Number of obs = 254
GS2SLS estimates Wald chi2(2) = 67.66
Prob > chi2 = 0.0000
Pseudo R2 = 0.1453
------------------------------------------------------------------------------
unemployment | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
unemployment |
college | -.0939834 .0131033 -7.17 0.000 -.1196653 -.0683015
_cons | 5.607379 .5033813 11.14 0.000 4.620769 6.593988
-------------+----------------------------------------------------------------
W |
unemployment | .2007728 .0942205 2.13 0.033 .016104 .3854415
------------------------------------------------------------------------------
Wald test of spatial terms: chi2(1) = 4.54 Prob > chi2 = 0.0331
.
. /*
> Step 3 : interpretation of results
> */
. estat impact
progress :100%
Average impacts Number of obs = 254
------------------------------------------------------------------------------
| Delta-Method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
direct |
college | -.0945245 .0130576 -7.24 0.000 -.120117 -.0689321
-------------+----------------------------------------------------------------
indirect |
college | -.0195459 .010691 -1.83 0.068 -.0405 .0014081
-------------+----------------------------------------------------------------
total |
college | -.1140705 .0171995 -6.63 0.000 -.1477808 -.0803602
------------------------------------------------------------------------------
We can have spatial lags for dependent variable, independent variables, and error terms. They have different interpretations. Here are some examples :
spregress unemployment college, ivarlag(W : college) gsls2s
spregress unemployment college, errorlag(W) gsls2s
spregress unemployment college, errorlag(M) dvarlag(W) gsls2s
spregress unemployment college, dvarlag(W1) dvarlag(W2) gsls2s
spregress unemployment college, dvarlag(W) ml
spregress unemployment college, errorlag(W) ml
spregress unemployment college, ivarlag(W1: college) ivarlag(W2:college) ml
spivregress dui nodui vehicles i.dry (police = elect) , dvarlag(W) errorlag(M)
spxtregress hrate ln_population gini , fe dvarlag(W) errorlag(M)
spxtregress hrate ln_population gini , re dvarlag(W) errorlag(M)
spxtregress hrate ln_population gini , re sarpanel dvarlag(W) errorlag(M)