Title | Working with tmap and maps | |
Author | Kevin Crow and William Gould, StataCorp |
With tmap, you can graph data onto maps and produce results such as these.
tmap is a community-contributed command by Maurizio Pisati. This FAQ explains how to use tmap. The process is as follows:
Type
. ssc install tmap . ssc install shp2dta . ssc install mif2dta
You need to perform this step only once.
A map records the geometry and attribute information of spatial features. Those maps are available from public and private sources. You can use maps recorded in either of two formats:
It is usually easier to find ESRI shapefiles than MapInfo Interchange Format files, but you may use either.
Say you want to find a map of the United States. Using a search engine such as Google or Yahoo!, search for "United States shapefile". One result is described as "This dataset is a polygon shapefile containing the states and territories of the United States ...". We found http://www.nws.noaa.gov/geodata/catalog/national/html/us_state.htm and clicked "Download Compressed Shapefile". We unzipped s_14jl05.zip, which contained the following files:
s_14jl05.shp s_14jl05.shx s_14jl05.dbf | These are the filenames as of May 2007. They will most likely change over time. |
We need only two of the files, s_14jl05.shp and s_14jl05.dbf.
Had we searched for a MapInfo map, there would have been only two files, and they probably would have been called s_14jl05.mif and s_14jl05.mid.
With the files we just extracted in the current directory, in Stata, we type,
. shp2dta using s_14jl05, database(usdb) coordinates(uscoord) genid(id)
Pay attention to the three options we specified:
shp2dta can take several minutes to run, depending on the map's size and level of detail. The U.S. map, however, took only a few seconds.
We would have translated MapInfo files the same way, but we would have used the command mif2dta instead of shp2dta.
In any case, the translation has created two new .dta datasets: usdb.dta and uscoord.dta.
To determine the coding used by the map's authors, type
. use usdb, clear . describe Contains data from usdb.dta obs: 56 vars: 6 29 Mar 2006 11:52 size: 2,744 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- STATE str2 %9s NAME str24 %24s FIPS str2 %9s LON double %10.0g LAT double %10.0g id byte %9.0g ------------------------------------------------------------------------------- Sorted by: id . list id NAME in 1/5 +-------------------------------+ | id NAME | |-------------------------------| 1. | 1 District of Columbia | 2. | 2 Arizona | 3. | 3 Ohio | 4. | 4 California | 5. | 5 Alabama | +-------------------------------+
Let’s shift away for a minute from the details of this map and talk about the graph we want to draw. We want to graph population by state, and we have a dataset named stats.dta containing population figures. In our dataset, we have states recorded using a different coding, and the identification variable is called scode.
We must modify our dataset to use the same coding as the map, and the variable containing the codes must be named id.
To achieve our goal, we made an intermediate dataset called trans.dta that contained two variables, scode and id. Each observation records equivalent codes. When we created trans.dta, we happened to look more carefully at usdb.dta. We discovered that the map dataset contained information about not only U.S. states, but also territories. We will just ignore that extra information. Our trans.dta dataset records only the 51 observations we care about, one for each state plus Washington, D.C.
Then we merged our stats.dta with trans.dta based on scode:
. use stats . merge scode using trans, sort unique
To ensure that there were no errors, we checked that all observations matched (_merge==3) and then dropped the _merge variable:
. tabulate _merge (output omitted) . drop _merge
We now must merge stats.dta with usdb.dta from the map, and this merge is based on the id variable:
. merge id using usdb, sort unique
Because our map includes locations not included in our original data, namely, territories as well as states, there will be observations in usdb.dta that are not also in stats.dta. We should check our merge:
. tabulate _merge
Here we expect all _merge values to be 2 and 3. If our map did not include territories, or if our original data did, we would expect all _merge values to be 3.
Finally, drop the unnecessary observations:
. drop if _merge!=3
To draw the graph, type
. tmap choropleth pop1990, id(id) map(uscoord.dta) palette(Blues)
We will soon deal with Alaska and Hawaii and the effect they have on our graph. Right now, focus on what we typed:
. tmap choropleth pop1990, id(id) map(uscoord.dta) palette(Blues)
Choropleth is not the name of a variable in our dataset; it is the kind of graph we want to draw. In a choropleth graph, different areas have different colors. tmap can draw other kinds of graphs, too.
Let’s go over the options we specified:
In the command
. tmap choropleth pop1990, id(id) map(uscoord.dta) palette(Blues)
we specified variable pop1990, and in the dataset, that variable contains the population. The units do not matter; the data could just as well be coded in millions and we would have obtained the same graph, although the legend would change.
By default, tmap choropleth divides the specified variable into four groups that are based on quartiles. You can change the number of groups by using option clnumber(#), where # can be between 2 and 9.
We will stick with four groups. However, we want to exclude Alaska and Hawaii from our graph. To do that, type
. tmap choropleth pop1990 if id!=13 & id!=56, id(id) map(uscoord.dta) palette(Blues)
or
. tmap choropleth pop1990 if NAME!="Alaska" & NAME!="Hawaii", id(id) map(uscoord.dta) palette(Blues)
because 56 and 13 are the id codes for Alaska and Hawaii, and because our dataset happens to contain variable NAME, which records the name in string form, we obtain this graph:
Look closely at the legend and you will see that the population ranges are displayed in scientific notation. You can change the display format with option legformat(format). You might specify legformat(%20.0f). Or you can change the units of the variable. We will change population to be recorded in millions:
. replace pop1990 = pop1990/1e+6
The legend is also too small. You can make the legend bigger with option legsize(#), where # specifies a text-size multiplier, such as 2. Our improved graph is shown below:
. tmap choropleth pop1990 if id!=13 & id!=56, id(id) map(uscoord.dta) palette(Blues) legsize(2)
tmap has many other options. Read about them in the online help file (type help tmap) or in the original article by Maurizio Pisati (2004).
Friedrich Huebler&rquo;s blog, at http://huebler.blogspot.com, occasionally discusses tmap.