Bill gave here a solution based on -merge-.
Those interested in such problems may want to compare with my earlier
and totally different solution.
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.0903/date/article-698.html>
Nick
[email protected]
William Gould, StataCorp LP
Eunsu Ju <[email protected]> writes,
> I would like to generate a new variable which contains the information
of
> parents, e.g. dad's education. My data looks like below.
>
> FID PID Var1 Edu Var3 Dad's FID Dad's PID Dad's Edu
> 1001 10 1 3 5 . .
> 1001 20 3 3 2 . .
> 1001 30 4 2 3 1001 10 3
> 1001 31 8 5 5 1001 10 3
> 1002 1 2 4 3 . . .
> 1002 10 4 2 1 1002 1 4
> 1002 20 5 4 2 . . .
> 1002 30 9 3 2 1002 10 2
> 1002 31 6 1 4 1002 10 2
> 1002 32 4 2 5 1002 10 2
>
> Note: FID = Family ID; PID = Person ID; Edu = Educational attainment
> (Values are randomly assigned, but data structure is similar to the
above.)
>
> What I want to do is to have the last (far left) column, which is not
> included in the dataset. (I want to do this kind of works for other
> variables, e.g. Var1 and Var3.)
> What is the best & simplist way to do this in stata?
>
> I think I can do this like the following.
> [...]
Eunsu Ju's plan is exactly right. He makes step 1,
> 1) Split the data set into two files so that one file contains Dad's
FID
> and Dad's PID, and the other has all others.
more difficult than it needs to be and later leaves doesn't worry about
something that may not happen, but it's right overall.
Here's the solution, calling Eunsu Ju's original data master.dta.
<<< snip >>>
We now have the desired result. Note that in master, the same dad might
appear more than once. The dads in step2.dta appear only once, however,
so
that same dad will be spread across the observations in master.
Perfect.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/