Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
From
Claude Beaty <[email protected]>
To
"<[email protected]>" <[email protected]>
Subject
Re: st: RE: RE: RE: RE: Combining multiple observations by an ID variable
Date
Wed, 13 Jun 2012 01:10:50 +0000
Thanks. That sounds like good advice.
Claude Beaty
Sent from my iPhone
On Jun 12, 2012, at 9:03 PM, "Sarah Edgington" <[email protected]> wrote:
> Claude,
> One thing you haven't mentioned, I don't think, is whether you have any
> duplicate observations per person in the set that you are trying to merge on
> to the visit data. If you have multiple visits for each ID in your master
> data set but the using dataset has only one record per ID you can simply do
> a m:1 merge and you shouldn't have any problems. If your other file has
> multiple records per ID, then your problem is more complicated and merging
> the files as-is probably is not a very good idea at all.
>
> Nick is right that the correct merge should not create duplicates. There
> are a number of ways to confirm this for yourself without having to
> -reshape- the data to wide form.
> For me the best place to start is by looking carefully at the created _merge
> variable. Are there cases that didn't match? Did you expect that? If not,
> that bears investigating.
>
> Next, look at the overall number of observations. First, count how many
> observations are in the master dataset in long form (that is, the data with
> ID codes and multiple visits per ID). Then, if you do a many to one merge
> using your second data set you should find that [original observations] =
> [number matched] + [number in master only]. If that isn't the case,
> something is likely wrong.
>
> Finally, if you're still worried and want to be sure that you have the exact
> same records in your merged data as you did before the merge, try looking at
> the means of some important variables from the master file before and after
> the merge. If your ID field is a numeric variable (though it's often best
> if it isn't) then you can look at the N and mean of that variable before and
> after the merge too. If the distribution of variables from the master file
> remains the same before and after the merge then you have some pretty good
> evidence that you have not somehow introduced extra records. (This assumes
> that all the data in your master file matches a record in the using file; if
> this isn't the case go back to the first step and make sure you understand
> why).
>
> I know merging sometimes seems complicated, but as long as you pay very
> close attention to the details of the output and make sure you understand
> why some IDs matched and some didn't, it's generally going to be ok. Unless
> you're doing a many to many merge. Then it's complicated and, in nearly all
> cases, the wrong approach entirely.
>
> Hope that helps.
>
> -Sarah
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Tuesday, June 12, 2012 5:29 PM
> To: [email protected]
> Subject: Re: st: RE: RE: RE: RE: Combining multiple observations by an ID
> variable
>
> Your original data structure strikes me as far better for the majority of
> purposes for which it might be used within Stata. Whether -reshape
> wide- is possible is thus secondary. It is almost certainly not a good idea.
>
> Incidentally, -reshape- is a command, not a function. Also, I see no reason
> why the correct -merge- command should create extra observations as you
> imply here.
>
> Nick
>
> On Tue, Jun 12, 2012 at 11:31 PM, Claude Beaty <[email protected]> wrote:
>
>> Reshape was something I considered as well. Unfortunately, every time I
> attempt to run this code I get the error "too many macros". I have stata 12,
> which I believe is the most updated version. If anyone knows of a way around
> this, please let me know.
>
> Swanquist, Quinn Thomas
>
>> Fair enough,
>>
>> If you need the observations to equal the number of visits and you need to
> keep the data from each visit, you are going to need to use the reshape wide
> function on the master dataset before the merge. Since you said that you
> have 70 variables for each visit, you will now have 70 * the max number of
> visits variables. Depending on your version of Stata you may or may not be
> able to work with that many variables.
>>
>> You can get help with this function using:
>>
>> help reshape
>
> Claude Beaty
>
>> It looks like the merger attempt was likely successful, though I'm sure
> there are some duplicates. However, your suggested code did not help to
> shift the data so that the total observations equal the number of ID codes
> instead of the number of visits. I have tried reshaping etc, but there are
> too many macros to reshape all of the variables. Is there another way? If I
> can arrange the data in this way, it is easier to compare with my previous
> file and find duplicate ID codes. As it stands now, it is difficult to tell
> if duplicate ID codes are due to successive visits or duplications created
> by the file merger.
>
> Swanquist, Quinn Thomas
>
>> Do you have an identifier for visit number (if not you could use date).
>>
>> Sort as follows:
>>
>> sort IDcode visit
>>
>> then merge many to one as follows:
>>
>> merge m:1 IDcode using "usingfile"
>
> Claude Beaty
>
>> I have a large dataset of observations in which individuals (~40,000
>> ID codes) were evaluated multiple times (5-10 visit numbers per
>> individual) on over 70 variables. However, the data has been arranged
>> so that each visit number is an observation, instead of each
>> individual ID code as an observation. I need to merge this file with
>> another file sorted by individual ID codes. How do I rearrange this
>> data so that it is arranged by ID codes with consecutive follow up
>> visits? Thanks
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/