Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Need help with combing observations by using by: command along with subscripting
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Need help with combing observations by using by: command along with subscripting
Date
Wed, 6 Feb 2013 18:57:52 +0000
No; as said, my code is a special trick for paired observations only.
For numeric variables, this works if there is at most one distinct
non-missing value within groups of -id-
bysort id (whatever) : replace whatever = whatever[_n-1] if missing(whatever)
but you need to check that assumption. This hinges on numeric
variables being sorted so that missing values come last.
For string variables, this works if there is at most one distinct
non-missing value within groups of -id-
bysort id (whatever) : replace whatever = whatever[_N] if missing(whatever)
but you need to check that assumption. This hinges on string variables
being sorted so that missing values come first.
A check of the assumption afterwards is
bysort id : assert whatever[1] == whatever[_N]
If that's wrong you messed up somewhere in your assumptions, so it's
best to check for one distinct value _before_ you change anything. A
general check is
bysort id (whatever) : assert whatever == whatever[1] | missing(whatever)
for numeric variables
and
bysort id (whatever) : assert whatever == whatever[_N] | missing(whatever)
for string variables.
See also
FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
1/12 How can I replace missing values with previous or
following nonmissing values or within sequences?
http://www.stata.com/support/faqs/data-management/
replacing-missing-values/
On Wed, Feb 6, 2013 at 6:43 PM, Michael Stewart
<[email protected]> wrote:
> Dear Nick, Rebecca and Steve,
> SInce I emailed you , I have been working on the collapse command
> my code is as follows
>
> collapse (min)dtbirth (lastnm)accountnumber (firstnm)
> medicalrecordnumber,by(PATIENTno) and it seem to work
> Please let me know if you see any errors in the code
>
> My apologies, I should have been more explicitand given a better
> example, the observations per pt not only comes in pairs but also
> more than pairs and different set of values across the variables are
> missing .Can I still use nicks code in this senario??
>
> Rebecca: first code works well.
> Regarding second code, I am not well versed with locals etc, hence I
> saved the mrn file and used to update and it works well
>
> Steve: in am in the process of working with your code.I will let you
> know if it doesnt work
> Thanks a lot everyone
>
> --
> Thank you ,
> Yours Sincerely,
> Mike
>
>
>
>
>
>
>
> On Wed, Feb 6, 2013 at 1:26 PM, Nick Cox <[email protected]> wrote:
>> The context is apparently that observations for each distinct person
>> occur in pairs.
>>
>> If so, this is a general trick:
>>
>> bysort PATIENTno : replace whatever = whatever[3 - _n] if missing(whatever)
>>
>> If that looks a bit odd, let us just run through the possibilities.
>>
>> If whatever[1] is missing, you want to copy whatever[2]. (If
>> whatever[2] is missing too, no harm done.)
>>
>> If whatever[2] is missing, you want to copy whatever[1]. (If
>> whatever[1] is missing too, no harm done.)
>>
>> -whatever- can be numeric or string: same code applies.
>>
>> The rule [3 - _n] ties these cases together. If _n is 1 then 3 - _n is
>> 2 and vice versa.
>>
>> If it looked odd a few lines ago, it now seems all too obvious.
>>
>> Note also that -- as in Rebecca's careful code -- you should only
>> overwrite _missing_ values, otherwise you can propagate missings and
>> lose what you have.
>>
>> Now let's go back to the assumption that observations occur in pairs.
>> If that is not true, this trick may mess things up. But it is painless
>> to add at the
>> end
>>
>> & _N == 2
>>
>> bysort PATIENTno : replace whatever = whatever[3 - _n] if
>> missing(whatever) & _N == 2
>>
>> That way we insist on this being done only for pairs of observations.
>>
>> I find this trick to be slightly but pleasantly amusing when it is the
>> solution, perhaps because it hinges on Stata's treatment of _n and _N
>> and so is a very small insider's joke.
>>
>> Nick
>>
>> On Wed, Feb 6, 2013 at 5:44 PM, Rebecca Pope <[email protected]> wrote:
>>> Mike,
>>> The way you have specified the -bysort- currently, Stata is looking
>>> within the patient and date of birth for multiple records. You just
>>> want it to look within patient only.
>>>
>>> bys PATIENTno (dtbirth): replace medicalrecordnumber =
>>> medicalrecordnumber[1] if missing(medicalrecordnumber)
>>>
>>> The underlying assumption is that an MRN will always be present when
>>> the DOB is present. If not, you might be better off creating a look-up
>>> table of MRNs and using -merge- to update your missing values.
>>>
>>> preserve
>>> keep PATIENTno medicalrecordnumber
>>> keep if !missing(medicalrecordnumber)
>>> duplicates drop
>>> tempfile mrn
>>> save `"`mrn'"'
>>> restore
>>> merge m:1 PATIENTno using `"`mrn'"', update
>>>
>>> * warning: code not tested
>>>
>>> Cheers,
>>> Rebecca
>>>
>>> On Wed, Feb 6, 2013 at 11:12 AM, Michael Stewart
>>> <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I need help with following data where I am trying to combine
>>>> observations for each patient
>>>>
>>>> PATIENTno dtbirth medicalrecordnumber accountnumber
>>>> 1 8/10/1907 1111111
>>>> 1 . dddddddd
>>>> 2 12/14/1910 222222222
>>>> 2 . eeeeee
>>>> 3 4/16/1914 33333333 rrrrrrr
>>>>
>>>>
>>>> PATIENTno,medicalrecordnumber & accountnumber -->STRING format
>>>>
>>>> dtbirth-->%td format
>>>>
>>>>
>>>> I need to combine row one and row two for patient 1 so that resultatnt
>>>> row will have information from both rows one and row two
>>>>
>>>> I tried to use subscripting as follows
>>>>
>>>> bysort PATIENTno dtbirth: repalce medicalrecordnumber= medicalrecordnumber[_n-1]
>>>>
>>>> then all values for the medicalrecordnumber are turning to blank spaces
>>>>
>>>> I also tried bysort PATIENTno dtbirth: repalce medicalrecordnumber=
>>>> medicalrecordnumber[1] but it didnt work either.
>>>>
>>>> Can you please help.
>>>>
>>>> --
>>>> Thank you ,
>>>> Yours Sincerely,
>>>> Mike.
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> On Wed, Feb 6, 2013 at 11:12 AM, Michael Stewart
>>> <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I need help with following data where I am trying to combine
>>>> observations for each patient
>>>>
>>>> PATIENTno dtbirth medicalrecordnumber accountnumber
>>>> 1 8/10/1907 1111111
>>>> 1 . dddddddd
>>>> 2 12/14/1910 222222222
>>>> 2 . eeeeee
>>>> 3 4/16/1914 33333333 rrrrrrr
>>>>
>>>>
>>>> PATIENTno,medicalrecordnumber & accountnumber -->STRING format
>>>>
>>>> dtbirth-->%td format
>>>>
>>>>
>>>> I need to combine row one and row two for patient 1 so that resultatnt
>>>> row will have information from both rows one and row two
>>>>
>>>> I tried to use subscripting as follows
>>>>
>>>> bysort PATIENTno dtbirth: repalce medicalrecordnumber= medicalrecordnumber[_n-1]
>>>>
>>>> then all values for the medicalrecordnumber are turning to blank spaces
>>>>
>>>> I also tried bysort PATIENTno dtbirth: repalce medicalrecordnumber=
>>>> medicalrecordnumber[1] but it didnt work either.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/