Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: Combining multiple observations into one observation with multiple variables
From
Conor Hughes <[email protected]>
To
[email protected]
Subject
st: Re: Combining multiple observations into one observation with multiple variables
Date
Wed, 30 Jun 2010 14:06:32 +0700
Sorry, my tables got smushed:
Dataset1
----------------------------------------
household id | individual id
----------------------------------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
3 | 1
3 | 2
Dataset 2
-----------------------------------------------------------
household id | household characteristic id
------------------------------------------------------------
1 | 1
1 | 3
1 | 7
1 | 11
2 | 1
2 | 8
3 | 2
3 | 7
3 | 13
On Wed, Jun 30, 2010 at 1:40 PM, Conor Hughes <[email protected]> wrote:
> Hi All,
> I have a couple of survey datasets that I need to merge, but they're
> organized in an inconvenient way. The first is organized by
> household, and individuals within the household. The second is only
> organized by household. I'd like to do a many-to-one merge on
> household, so as to preserve the individual id's. However, in the
> second dataset, rather than adding household characteristics as
> variables, it adds them as observations, e.g.:
>
> Dataset 1 Dataset 2
> -------------------------------------
> -----------------------------------------------------------
> household id | individual id household id |
> household characteristic id
> -------------------------------------
> ------------------------------------------------------------
> 1 | 1
> 1 | 1
> 1 | 2
> 1 | 3
> 1 | 3
> 1 | 7
> 2 | 1
> 1 | 11
> 2 | 2
> 2 | 1
> 3 | 1
> 2 | 8
> 3 | 2
> 3 | 2
>
> 3 | 7
>
> 3 | 13
> I'd prefer, in the second dataset, to have one observation for each
> household, including household characteristics as dummy variables. As
> it is, the only way to get them together is via many-to-many merge,
> which is foolish and doesn't work well, giving an output like
> -------------------------------------------------------------------------------
> household id | individual id | household characteristic id
> -------------------------------------------------------------------------------
> 1 | 1 | 1
> 1 | 2 | 3
> 1 | 3 | 7
> 1 | 3 | 11
> 2 | 1 | 1
> 2 | 2 | 8
> 3 | 1 | 2
> 3 | 2 | 7
> 3 | 2 | 13
> Which messes up the the first dataset, since it creates repeat
> observations of individuals. Is there a graceful way of the changing
> the multiple observations per household in the second dataset to one
> observation per household with characteristics represented as dummy
> variables? Any help would be greatly appreciated. And please let me
> know if I've described the situation poorly and you'd like
> clarification.
>
> Cheers,
> Conor
>
--
Conor Hughes
Mathematics and Economics
University of Chicago 2011
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/