st: RE: duplicating values within one variable

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: duplicating values within one variable
Date   Mon, 21 Nov 2005 11:47:03 -0000

This is a common problem and the good news is that
you have yet to discover the power of -by:-, the 
most Stataish of all Stata's features. 

You want something like 

gen test = employ if industry == <whatever> 
sort state county year test 
by state county year: replace test = test[1]

or (more concisely) 

gen test = employ if industry == <whatever> 
bysort state county year (test): replace test = test[1]

On the -sort-

sort state country year test 

the observations with non-missing -test- are sorted 
to the first observation within each -state county year- 
combination. Then the others can be replaced by 
the first value in each block. 

A Mickey and Minnie tutorial on -by:- is at

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        Q1/02   SJ 2(1):86-102                                   (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N

and several FAQs, particularly on data management, given further 

[email protected] 

Gregor Franz
> How can I make observations in a variable take on the value of a 
> specific observation within this variable? For example, I 
> have employees 
> by industry in each county in each state for several years. I want to 
> create a variable that is equal to employees in industry = x for each 
> county and state by year for all observations. If I type gen test= 
> employees if industry ==x, I only get one observation each year, by 
> county and state, but I want the rest of the (now missing) 
> obesrvations 
> in variable 'test' (which in the original variable 
> 'employees' take on 
> the values by different industries) to take on the value of 
> industry x. 
> So in the end all observations for the variable 'test'  would take on 
> the value of industry county state and year.

