Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Difficult merging process
From
Nathan Hutto <[email protected]>
To
[email protected]
Subject
Re: st: Difficult merging process
Date
Tue, 21 Dec 2010 10:31:13 -0500
Thanks to both of you. Both of these seem like reasonable ways to
solve this issue, so I'll certainly try them out. I took a quick stab
at -cross- and it froze up my computer, so I'll have to do it in
chunks as you suggest.
Best,
Nathan
On Mon, Dec 20, 2010 at 11:00 PM, Keith Dear <[email protected]> wrote:
>
> Nathan,
> You don't say how large your two datasets are, but is -cross- out of the
> question?
>
> If you CAN build a dataset consisting of all combinations of pregnancies
> and weather events, then obviously it's simple from there: you can use
> whatever arbitrarily complex criteria you please to discard unwanted
> pairs, probably finishing with -duplicates drop- to retain only one
> record per exposed birth.
>
> If the mega-cross is too mega, perhaps you can still use this approach
> in chunks, e.g. one year (or decade) of birth at a time.
>
> (not subtle, but actually sledgehammers do a pretty good job on
> walnuts!)
> kd
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael N.
> Mitchell
> Sent: Tuesday, 21 December 2010 2:25 PM
> To: [email protected]
> Subject: Re: st: Difficult merging process
>
> Dear Nathan
>
> This is a very thorny problem!
>
> My first thought is to try and focus on the dates as a way of
> matching, matching on the
> "week" of the whether event to "week" of pregnancy...
>
> 1) For the weather dataset, convert the "date" of the event into the
> "week" of the
> event (using the -wofd()- function). Call this variable "dateweek".
>
> 2) For the pregnancy dataset, make one record for every week of
> pregnancy per person
> (for a 9 month gestation, each person would have 36 records, one for
> each week of
> pregnancy... each record would be identified by the person id and the
> "dateweek" for each
> week of pregnancy.) This step would involve computing the number of
> weeks of gestation,
> using the "expand" command to make the multiple records per person, and
> then a -by id:
> generate- to compute the week number of pregnancy for the multiple weeks
> of pregnancy.
>
> 3) match merge the "weather" dataset to the "pregnancy" dataset on
> "dateweek", keeping
> just the matches. The resulting dataset contains the weeks of pregnancy
> with a weather event.
>
> 4) Compute the distance from the mother to the weather event.
> Eliminate events that are
> too many miles away.
>
> 5) There may be multiple records per mother per weather event. Use
> collapse to make one
> record per mother.
>
> Others might have better thoughts. I hope this helps.
>
> Michael N. Mitchell
> Data Management Using Stata -
> http://www.stata.com/bookstore/dmus.html
> A Visual Guide to Stata Graphics -
> http://www.stata.com/bookstore/vgsg.html
> Stata tidbit of the week - http://www.MichaelNormanMitchell.com
>
>
>
> On 2010-12-20 5.46 PM, Nathan Hutto wrote:
> > Hi all,
> >
> > I am attempting to merge two data sets in a way that is new to am and
> > am having trouble figuring out how to do so. One data set contains
> > geo-coded birth certificates and the other contains weather events. I
> > want to determine whether a mother was exposed to a weather event
> > during the course of her pregnancy. I have the birth date and
> > gestational length, so I can determine the dates of gestation by
> > subtracting one from the other. I also have the latitude and
> > longitude, address, city, and state of each pregnancy and weather
> > event. For this case, exposure to a weather event would be defined as
> > being pregnant when a weather event occurred in close proximity.
> >
> > I'm ok with over-merging a little bit; I can determine the exact
> > exposure by using one of Stata's length commands that can calculate
> > distance with latitude and longitude. But given that many people in my
> > data are exposed to a number of weather events, I'd like to whittle
> > down the amount of false positives.
> >
> > Any thoughts on this?
> >
> > Thank you,
> > Nathan
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/