Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: differentiating between groups of records with same date
From
Tim Evans <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: differentiating between groups of records with same date
Date
Wed, 1 Aug 2012 09:52:55 +0100
Nick,
Thanks for this. I don't think I want to go with -distinct- and -unique- as I want a flag variable permanently in the dataset. I've worked through the examples in http://www.stata.com/support/faqs/data-management/number-of-distinct-observations/, I'm not sure I'm translating this very well to my scenario, but that's probably my fault in how I'm thinking about it. So I'll give it some space before looking at it again.
Best wishes
Tim
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 31 July 2012 15:30
To: [email protected]
Subject: Re: st: differentiating between groups of records with same date
See
FAQ . . . . . . . . . . . . . . . . . . . Number of distinct observations
. . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. Longton
10/08 How do I compute the number of distinct observations?
http://www.stata.com/support/faqs/data-management/
number-of-distinct-observations/
SJ-12-2 dm0042_1 . . . . . . . . . . . . . . . . Software update for distinct
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q2/12 SJ 12(2):352
options added to restrict output to variables with a minimum
or maximum of distinct values
SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command
On Tue, Jul 31, 2012 at 8:27 AM, Tim Evans <[email protected]> wrote:
> Nick,
>
> Apologies for the lack of clarity.
>
> For the following dataset (below) I wish to count the distinct number of proc_type for each patient on a given surgery_date.
>
> patient_no cancer_no diag_date surgery_date proc_type
>> 9512834 0484360 21may1994 21may1994 H1
>> 9512834 0484358 21may1994 21may1994 H2
>> 9512834 0483234 26apr2000 21may2000 H1
>> 9512834 0483233 26apr2000
>> 0000012 0000012 21Jan1999 21Jan1999 H3
>> 0000012 0000013 21Jan1999 21Jan1999 H3
>> 0000012 0000014 21Jan1999 21Jan1999 H3
>
>
> In my snapshot above, patient_no 000012 has 3 cancers, with a surgery_date of 21Jan1999, but only one proc_type - so my count should be 1. In contrast, patient_number 9512834 has 2 cancers with a surgery_date of 21may1994, and has 2 proc_types on 21may1994 - my count should therefore be 2.
>
> Or put another way, for each surgery date, how many unique proc_types did each patient have.
>
> Hope this is clearer.
>
> Best wishes
>
> Tim
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: 31 July 2012 14:02
> To: [email protected]
> Subject: Re: st: differentiating between groups of records with same date
>
> Sorry, but what's your question?
>
> On 31 Jul 2012, at 13:32, Tim Evans <[email protected]> wrote:
>
>> Hi Nick,
>>
>> I've been taking a look at the reference you pointed me to, and been
>> experimenting, to see how I would count for each patient, the number
>> of different procedures that took place on the same date.
>>
>> Again I have
>>
>> patient_no cancer_no diag_date surgery_date
>> proc_type
>> 9512834 0484360 21may1994 21may1994 H1
>> 9512834 0484358 21may1994 21may1994 H2
>> 9512834 0483234 26apr2000 21may2000 H1
>> 9512834 0483233 26apr2000
>> 0000012 0000012 21Jan1999 21Jan1999 H3
>> 0000012 0000013 21Jan1999 21Jan1999 H3
>> 0000012 0000014 21Jan1999 21Jan1999 H3
>>
>> So I want to say that patient 9512834 had 2 different proc_types on
>> 21may1994 and that patient 0000012 had one operation.
>>
>> Best wishes
>>
>> Tim
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Tim Evans
>> Sent: 31 July 2012 10:53
>> To: '[email protected]'
>> Subject: RE: st: differentiating between groups of records with same
>> date
>>
>> Nick,
>>
>> Thanks for this, a handy piece of code/functionality.
>>
>> Best wishes
>> Tim
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Nick Cox
>> Sent: 30 July 2012 17:50
>> To: [email protected]
>> Subject: Re: st: differentiating between groups of records with same
>> date
>>
>> bysort patient_no diag_date: gen freq = _N
>>
>> See also
>>
>> SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move
>> step by: step
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> N. J. Cox
>> Q1/02 SJ 2(1):86--102 (no
>> commands)
>> explains the use of the by varlist : construct to tackle
>> a variety of problems with group structure, ranging from
>> simple calculations for each of several groups to more
>> advanced manipulations that use the built-in _n and _N
>>
>>
>> Nick
>>
>> On Mon, Jul 30, 2012 at 10:20 AM, Tim Evans <[email protected]>
>> wrote:
>>> Hi all,
>>>
>>> I have a group of patients who are in a dataset of cancers. Each
>>> patient may have more than one cancer diagnosed, and so may be
>>> present in my dataset a number of times. Each patient has a unique
>>> patient identifier, and each cancer has a unique cancer identifier.
>>> Each row of data is cancer specific, but does contain the patient
>>> identifier. It is possible that a patient has 2 cancers diagnosed
>>> on the same day in my dataset. What I would like to do is generate
>>> a flag next to each record to show against each cancer the number
>>> of cancers diagnosed on the same day.
>>>
>>> My data are like this:
>>>
>>> patient_no cancer_no diag_date surgery_date
>>> 9512834 0484360 21may1994 21may1994
>>> 9512834 0484358 21may1994 21may1994
>>> 9512834 0483234 26apr2000 21may2000
>>> 9512834 0483233 26apr2000
>>> 0000057 0000057 19jul2009 19jul2009
>>> 0000060 0000060 02nov2009 24nov2009
>>> 0000074 0000074 21sep2009 22nov2009
>>>
>>>
>>> For example, patient 9512834 had 2 cancers diagnosed on 21may1994
>>> and so for cancer_no 0484360 and 0484358, I would like to generate
>>> a new variable with the value 2 against each record. Similiarly
>>> patient 0000057 has only one cancer diagnosed, and so the new
>>> variable would contain 1.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
_DISCLAIMER:
This email and any attachments hereto contains proprietary information, some or all of which may be confidential or legally privileged. It is for the exclusive use of the intended recipient(s) only. If an addressing or transmission error has misdirected this e-mail and you are not the intended recipient(s), please notify the author by replying to this e-mail. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this e-mail or any attachments, as this may be unlawful.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/