Dear list,
quite often I want to compute a variable for each observation making reference to values from other observations. I found a way of doing it but there must be a faster way of programming this! For example, in a dataset with the variables household (HH), member, father and education
HH member father education
1 1 . 5
1 2 . 6
1 3 2 1
2 1 3 2
2 2 . 4
2 3 . 5
2 4 3 1
The variable father indicates that in HH 1 the observation with member==2 is the father of member==3. Similar, in HH 2 members 1 and 4 have member 3 as father. Suppose I want to create a variable containing the education of the father, ie
HH member father education edu_father
1 1 . 5 .
1 2 . 6 .
1 3 2 1 6
2 1 3 2 5
2 2 . 4 .
2 3 . 5 .
2 4 3 1 5
What is the easiest way of doing so? In I did this with a loop, which looks like this:
gen edu_father=.
gen mysample = father!=.
gsort -mysample
local end=r(N)
forv i=1/`end'{
su father in `i'/`i', mean
local father=r(mean)
su HH in `i'/`i', mean
local HH =r(mean)
su education if HH == `HH' & member==`father', mean
replace edu_father= r(mean) in `i'/`i'
}
This works, but it is very time consuming in big datasets (on a P4 I estimate 4 hours for the problem I have), and certainly not very elegant.
Does anyone know a shortcut for this? Any suggestions are greatly appreciated, many thanks,
Fabian
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/