Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: tricky data merge/joinby problem
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: tricky data merge/joinby problem
Date
Fri, 04 Mar 2011 10:53:18 -0500
At 10:30 AM 3/4/2011, Dimitry wrote:
I have two files that I would like to merge. The first contains data
on city blocks and block groups (BGs) and fraction of population
variable. A simplified version of the data looks like this:
bid bgid fracpop
11 1 .5
12 1 .5
21 2 .3
22 2 .2
23 2 .5
For example, BG 1 contains 2 blocks, each of which has half of the BG
1's population (fracpop==.5). The unique identifier in this file is
bid.
I would like to merge the data above with panel data file2 that
contains block group populations over time. This data looks like:
bgid dateyq bgpop
1 2010q1 100
1 2010q2 105
1 2010q3 106
1 2010q4 125
Here bgid and dateyq are the identifiers. The final goal of merging is
to come up with a population for each block by allocating bgpop using
the weights in fracpop. For example, for BG 1, this would yield:
bid bgid dateyq bpop
11 1 2010q1 50
12 1 2010q1 50
Does this require the dreaded m:m merge with bgid as the id as the
first step? That appears to work (although I only checked a few
cases). Or is is better to expand the first file into a panel and then
merge on bgid and dateyq? Or should I use -joinby bgid using
file2.dta-? I am not sure which is the most efficient solution.
I'm not sure what you mean by m:m merge, as an actual -merge- with a
many-to-many correspondence usually leads to meaningless pairings.
And I'm not sure what you meant by "expand ... into a panel".
But the situation does look like -joinby-.
HTH
--David
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/