This is about changing the way you work.
Datasets in memory are stored in frames, and frames are named. When Stata launches, it creates a frame named default, but there is nothing special about it, and the name has no special or secret meaning. You can rename it.
You can create frames, and delete them, and rename them. The commands are
. frame create framename . frame drop framename . frame rename oldname newname
Stata will list the names of all the existing frames if you type
. frames dir
One of the frame names that frames dir lists will be the current frame. It is the frame that Stata commands assume that you want them to use. To find out the name of the current frame, type
. frame (current frame is default)
We are in the frame default. If we fit a regression, it would be fit on the data in default. Or we could change to another frame. We might type
. frame change myframe
Now if we fit a regression, it would be fit on the data in myframe.
So that is one way of working with frames. You can frame change, issue the Stata commands, and then frame change back.
Another way of working with frames is
. frame framename { stata_command stata_command . . }
and
. frame framename: one_stata_command
These commands run the Stata commands on the specified frame, and switch back to the original frame once they are finished.
And the final way to work with frames is to link them. If a frame is linked to another, it can access the other frame's data without changing them. We will demonstrate that below.
Here are five ways frames will change the way you work.
You are working to finish your project when the phone rings. Something has to be handled right now. Here is what you do:
. frame create interruption . frame change interruption . use another_dataset . do what needs doing . frame change default . frame drop interruption
You want to predict the income of men as if they were women and of women as if they were men. Frames provides yet another way you can do this. We are about to
Frames is how we will avoid changing the data.
. regress income i.sex##(i.ed c.age##c.age) i.occ . frame copy default new . frame new { replace sex = !sex // reverse the sexes predict pincome } . generate alt_income = _frget(new, pincome, _n) . frame drop new
generate copied values from frame new by using the _frget() function . The argument _n specified that observation 1 in new be copied to 1 in default, 2 in new to 2 in default, and so on.
You have two files, persons.dta and counties.dta, that are related. The persons live in the counties. You can load the datasets into separate frames and link them.
. use persons . frame create counties . frame counties: use counties . frlink m:1 countyid, frame(counties)
frlink links observations in the current frame to corresponding observations in the other frame. Variable countyid in persons.dta records the county in which each person lives. A variable of the same name in counties.dta records the county on which additional data are provided. The data were linked on countyid.
Assume counties contains a variable med_income containing each county's median income. Then you could type
. frget med_income, from(counties) . regress income med_income educ age
The first command copies med_income from counties to the current frame. There are lots of issues in doing this, but they are handled automatically. Some individuals might live in counties not recorded in counties. Others might live in the same county. And there may be counties in which no one in persons.dta lives. All of that is handled.
You can use one frame to record results from another. The frame create command, which we have used before, can also create new frames containing new variables. For instance,
. frame create newframename stat1 stat2
Another frame command,
. frame post framename (expression) (expression) ...
Thus, we can use frame create to create a new frame ready to receive new observations, and we can use frame post to send the new observations we want to add. Here is an example of how we can put frame create and frame post to use.
How often will a sample of 100 draws from N(0,1) have a mean different from 0 at the 5% level? Let's do 1,000 simulations.
. frame create results t p . forvalues i=1(1)1000 { 2. quietly set obs 100 3. quietly generate x = rnormal() 4. quietly ttest x=0 5. frame post results (r(t)) (r(p)) 6. drop _all 7. } . frame results: count if p<=0.05 43
How often will draws from N(0,1) produce coefficients with |t|>2 in a regression? Let's do 1,000 simulations:
. sysuse auto (1978 Automobile Data) . frame create results b se . forvalues i=1(1)1000 { 2. quietly generate x = rnormal() 3. quietly regress mpg x weight displ 4. frame post results (_b[x]) (_se[x]) 5. drop x 6. } . frame results: count if abs(b/se) > 2 54
Recording simulation results is one way you can use frame create and frame post. Here's another. We recently had a dataset with 2,000-plus variables in it, and we wanted to get its names organized and standardized. We started by creating a dataset of the variable names:
. frame create varnames str32 varname . foreach name of varlist _all { 2. frame post varnames ("`name'") 3. }
Now we had a dataset in frame varnames with 2,000-plus observations of variable varname. We looked at the dataset, sorted it, performed other shrewd transformations on it, and finally knew what we wanted to do. We started like this:
. frame change varnames . rename varname oldname . generate str32 newname = ""
Then, we copied some old names over to newname. We filled others in by hand. We even filled some of them in with programs we wrote. Finally, we reached the point where we had a new name for each original name.
Then, we used frames to change the names in the original data:
. frame change varnames . local N = _N . forvalues i=1(1)`N' { 2. local old = oldname[`i'] 3. local new = newname[`i'] 4. frame default: rename `old' `new' 5. }
Then, we put the names in the order we had them in our dataset:
. local names = "" . forvalues i=1(1)`N' { 2. local names = "`names' " + newname[`i'] 3. } . frame default: order `names'
Another frame feature is frame put for copying a subset of data from one frame to another.
. frame put varlist if expression, into(framename)
Here is how you might use it.
. frame put city population med_income, into(subset) . frame change subset . stata_command . stata_command . frame change default . frame drop subset
. frame put city population med_income if country=="Germany", into(subset) . frame change subset . stata_command . stata_command . frame change default
We once had country data and wanted to perform country_analysis.do for each country separately, starting with Afghanistan and ending with Zimbabwe. We did the following and produced Afghanistan.log, Albania.log, Algeria.log, ... Zimbabwe.log.
. egen c = group(country) . quietly summarize c . local N_of_countries = r(max) . forvalues i=1(1)`N_of_countries' { 2. frame put if c==`i', into(subset) 3. frame subset { 4. local cntryname = country[1] 5. log using "`cntryname'.log" 6. do country_analysis 7. log close 8. } 9. frame drop subset 10. }
We said there were five ways frames will change the way you work, and yet here we are on number 6. We do not count this one because you do not have to change the way you work to experience the benefit.
The do- and ado-files that you have previously written that use preserve and restore will run faster if you use Stata/MP because it secretly uses frames in place of temporary files to preserve data. The speed-up is sometimes remarkable. We have old do- and ado-files that run 20 percent faster.
Learn more about Stata's data management features.
Read more about frames in the Stata Data Management Reference Manual; see [D] frames intro.