Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: st: Merging 2 Tricky Panel Datasets
From
Joerg Luedicke <[email protected]>
To
[email protected]
Subject
Re: st: RE: st: Merging 2 Tricky Panel Datasets
Date
Mon, 14 Mar 2011 21:13:58 -0400
On Mon, Mar 14, 2011 at 5:29 PM, Clifton Chow
<[email protected]> wrote:
>
> A. Interview date - This is matched identically on both datasets, but the format for dataset 1 = mo/day/year and for dataset 2 = month, day and year are broken out into separate variables.
>
> dataset 1 dataset 2
>
> obs 1 04 12 09 obs 1 04/12/2009
> obs 2 12 14 10 obs 2 12/14/2010
> B. Interview sequence: This is the tricky part. Dataset 1 has a variable denoting interview sequence from 1- 9, but dataset 2 has interview sequence variable from 1 - 10, with 10 being the final interview conducted before discharge that can map on to the final interview recorded in dataset 1.
>
> Dataset 1 Dataset 2
>
> ID Seq ID Seq
> obs 1 1 obs 1 1
> obs 1 2 obs 1 2
> obs 2 1 obs 2 1
> obs 2 2 obs 2 2
> OBS 2 3 OBS2 10
>
> This means for individuals from dataset 2 without a sequence number 10, everything lines up perfectly between the two datasets (1-9). But for those with a sequence number 10, it can map on to any possible datapoint in dataset 1, depending on which is the individual's final interview as recorded in dataset 1.
>
> Does anyone have a program (either forloop or if statement) that can handle datapoint 10 from dataset 2 so I can still successfully merge both datasets without losing significant data from individuals who were discharged (those with datapoint 10)?
RE A, type -help date- for how Stata deals with dates and times and
how you can convert from numeric into dates and vice versa. For
instance you could change the date from your dataset2 into 3 variables
as in dataset 1 and then merge accordingly.
RE B, this is probably easier if I understand your problem correctly.
In dataset1, you can simply replace the last observation in the
sequence with 10 or replace the 10 in dataset2 with the previous
number in the sequence plus 1. For the first you could write something
like:
gen seq2=Seq
sort ID seq
bys ID: replace seq2=10 if _n==_N
For the second option it could be:
gen seq3=Seq
sort ID seq
bys ID: replace seq3=[_n-1]+1 if _n==_N
hth,
J.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/