Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: looping command for duplicates and merging datasets by matching dates
From
Anees Abdul Pari <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: looping command for duplicates and merging datasets by matching dates
Date
Fri, 12 Jul 2013 12:38:10 +0000
Hi Everybody,
I am struggling over the last week with data cleaning for a dataset that has got multiple observations per ID. Specifically, I have two queries and it would be wonderful if you could advise me accordingly.
a) In this study (Table 1 and Table 2), 200 participants were sent three different questionnaires sometimes on the same day, sometimes different days and sometimes multiple questionnaires on the same day. In total, I have roughly 25000 observations. I have created three separate STATA files for each questionnaire that includes participants’ ID, questionnaire date and response.
For each questionnaire, I would like to drop duplicates that were sent on the same date. For example, Individual 1 was sent Questionnaire 1 three times on 5 September 2011. I would like to keep only that observation which has got the total score. However, in some cases, such as for ID 1 on 28th November 2011- two questionnaires (same Questionnaire1) were sent and both responses were missing. In this case, I would like to keep only one of the missing response, and drop the other duplicate observation. I would ideally like to create a looping command which goes through the responses of questionnaire on a single day and drops duplicate observations while preserving unique observations that are either missing or have valid scores.
Table 1: Questionnaire 1
id Q1date prompt response Scheduledresponse Noresponse Q1totalscore Q1.1 Q1.2 Q1.3 Q1.4 Q1.5
1 05-Sep-11 1 0 0
1 05-Sep-11 0 0 0 0
1 05-Sep-11 0 1 1 0 16 3 3 4 3 3
1 25-Oct-11 1 0 0 0
1 26-Oct-11 0 0 0 0
1 26-Oct-11 0 1 1 0 6 1 1 0 2 2
2 28-Nov-11 1 0 0 0
2 28-Nov-11 0 1 1 1
2) The second table describes dataset for the second questionnaire. The same patients were sent other questionnaire on either the same day or different days. In addition to dropping of duplicate observations that I described earlier, I would like to match/ merge the outcome response (scores) for these questionnaires (questionnaire 1, questionnaire 2, and questionnaire 3) if they fall within a week before or after the Questionnaire 2 response dates. For example, Patient 1’s questionnaire 2 total outcome on 29th August 2011 was 0.691. However, the nearest Outcome from Questionnaire 1 for the same patient1 is on 5th September 2011 (Table 1) which falls within one week of 29th August 2011. I would like to merge both the questionnaire (and Questionnaire 3 which I suppose will be similar to deal with) based on the dates of outcomes in Questionnaire 2 (within one week before or after) and drop rest of the observations.
Similarly, patient 2 has a valid response for questionnaire 2 on 22nd November 2011. For this individual, I would like to drop the duplicate Questionnaire 2 observation and then merge with corresponding outcome in questionnaire 1 (in this case it would be a missing value on 28th November 2011).
Table 2. Questionnaire 2
id Q2date prompt response Scheduledresponse Noresponse Q2totalscore Q2.1 Q2.2 Q2.3 Q2.4 Q2.5
1 29-Aug-11 1 0 0 0
1 29-Aug-11 0 1 1 0 0.691 2 1 2 2 1
1 29-Aug-11 0 0 0 0
1 26-Sep-11 1 0 0 0
1 26-Sep-11 0 1 1 0
1 26-Sep-11 0 0 0 0
1 24-Oct-11 1 0 0 0
1 26-Oct-11 1 0 0 0
1 26-Oct-11 0 1 1 0 0.689 1 1 2 2 2
1 26-Oct-11 0 0 0 0
2 21-Nov-11 1 0 0 0
2 22-Nov-11 0 1 1 0 0.088 2 1 2 3 2
2 22-Nov-11 0 0 0 0
I hope this make sense. I will be extremely grateful for your timely help.
Many thanks for your patience and time.
Best wishes
Anees Pari
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/