Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Difficult merging process

From	"Michael N. Mitchell" <[email protected]>
To	[email protected]
Subject	Re: st: Difficult merging process
Date	Mon, 20 Dec 2010 19:25:05 -0800

Dear Nathan

  This is a very thorny problem!

My first thought is to try and focus on the dates as a way of matching, matching on the"week" of the whether event to "week" of pregnancy...

1) For the weather dataset, convert the "date" of the event into the "week" of theevent (using the -wofd()- function). Call this variable "dateweek".

2) For the pregnancy dataset, make one record for every week of pregnancy per person(for a 9 month gestation, each person would have 36 records, one for each week ofpregnancy... each record would be identified by the person id and the "dateweek" for eachweek of pregnancy.) This step would involve computing the number of weeks of gestation,using the "expand" command to make the multiple records per person, and then a -by id:generate- to compute the week number of pregnancy for the multiple weeks of pregnancy.

3) match merge the "weather" dataset to the "pregnancy" dataset on "dateweek", keepingjust the matches. The resulting dataset contains the weeks of pregnancy with a weather event.

4) Compute the distance from the mother to the weather event. Eliminate events that aretoo many miles away.

5) There may be multiple records per mother per weather event. Use collapse to make onerecord per mother.


  Others might have better thoughts. I hope this helps.

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-12-20 5.46 PM, Nathan Hutto wrote:

Hi all,

I am attempting to merge two data sets in a way that is new to am and
am having trouble figuring out how to do so. One data set contains
geo-coded birth certificates and the other contains weather events. I
want to determine whether a mother was exposed to a weather event
during the course of her pregnancy. I have the birth date and
gestational length, so I can determine the dates of gestation by
subtracting one from the other. I also have the latitude and
longitude, address, city, and state of each pregnancy and weather
event. For this case, exposure to a weather event would be defined as
being pregnant when a weather event occurred in close proximity.

I'm ok with over-merging a little bit; I can determine the exact
exposure by using one of Stata's length commands that can calculate
distance with latitude and longitude. But given that many people in my
data are exposed to a number of weather events, I'd like to whittle
down the amount of false positives.

Any thoughts on this?

Thank you,
Nathan
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Difficult merging process
  - From: "Keith Dear" <[email protected]>

References:
- st: Difficult merging process
  - From: Nathan Hutto <[email protected]>

Prev by Date: Re: st:-expand- to adjust according to the sampling weight
Next by Date: RE: st: Difficult merging process
Previous by thread: st: Difficult merging process
Next by thread: RE: st: Difficult merging process
Index(es):
- Date
- Thread