Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: tricky data merge/joinby problem
From
"Dimitriy V. Masterov" <[email protected]>
To
Statalist <[email protected]>
Subject
st: tricky data merge/joinby problem
Date
Fri, 4 Mar 2011 10:30:38 -0500
I have two files that I would like to merge. The first contains data
on city blocks and block groups (BGs) and fraction of population
variable. A simplified version of the data looks like this:
bid bgid fracpop
11 1 .5
12 1 .5
21 2 .3
22 2 .2
23 2 .5
For example, BG 1 contains 2 blocks, each of which has half of the BG
1's population (fracpop==.5). The unique identifier in this file is
bid.
I would like to merge the data above with panel data file2 that
contains block group populations over time. This data looks like:
bgid dateyq bgpop
1 2010q1 100
1 2010q2 105
1 2010q3 106
1 2010q4 125
Here bgid and dateyq are the identifiers. The final goal of merging is
to come up with a population for each block by allocating bgpop using
the weights in fracpop. For example, for BG 1, this would yield:
bid bgid dateyq bpop
11 1 2010q1 50
12 1 2010q1 50
Does this require the dreaded m:m merge with bgid as the id as the
first step? That appears to work (although I only checked a few
cases). Or is is better to expand the first file into a panel and then
merge on bgid and dateyq? Or should I use -joinby bgid using
file2.dta-? I am not sure which is the most efficient solution.
DVM
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/