Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Problems with expand og reverting to original dataset
From
"Ada Ma" <[email protected]>
To
[email protected]
Subject
Re: st: Problems with expand og reverting to original dataset
Date
Mon, 24 Jan 2011 09:34:46 +0000
Ooops I made another mistake again. Don't mind me. Please ignore my last email.
Ada
Sent using BlackBerry®
-----Original Message-----
From: "Ada Ma" <[email protected]>
Date: Mon, 24 Jan 2011 09:30:37
To: <[email protected]>
Reply-To: [email protected]
Subject: Re: st: Problems with expand og reverting to original dataset
Hi Nick,
Thanks for pointing out my mistake. I'm thinking that OP's dataset might have some half siblings. Which is why putting the mother I'd in front of father's I'd solve the problem.
Ada
Sent using BlackBerry®
-----Original Message-----
From: Nick Cox <[email protected]>
Sender: [email protected]
Date: Mon, 24 Jan 2011 09:23:46
To: <[email protected]>
Reply-To: [email protected]: Re: st: Problems with expand og reverting to original dataset
You wrote that the error message disappeared on using
by mother_id father_id, sort
instead of
bysort mother_id father_id
These two are equivalent. Whatever removed your error was some other
change, I believe.
Nick
On Mon, Jan 24, 2011 at 9:11 AM, Grethe Søndergaard
<[email protected]> wrote:
> Thanks a lot to both of you for your explanations of how to handle my data.
> I am using cox-regression and the bysort command is so much easier
> than using expand as I intended to do. The error message disappeared
> when I wrote by mother_id father_id, sort (instead of bysort mother_id
> father_id).
> I am aware that choosing only two siblings from each family might be
> problematic and I will consider using reshape to include more
> siblings.
>
>
>
>
> 2011/1/20 Nick Cox <[email protected]>:
>> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
>>
>> My code was
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
>>
>> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
>>
>> Ada wants to correct this to
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
>>
>> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
>>
>> ... birth_date[3] - birth_date[2]
>>
>> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
>>
>> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
>>
>> Nick
>> [email protected]
>>
>> Ada Ma
>>
>> WRT your Q to Nick the command you should write is:
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
>> -birth_date
>>
>> [...]
>>
>> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
>> <[email protected]> wrote:
>>> Thank you for your answers
>>>
>>> @ Nick Cox: I have tried to run bysort mother_id father_id
>>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
>>> error message appear: "factor variables and time-series operators not
>>> allowed". Can I solve this problem - by somehow changing the type of
>>> variable that birth_date is?
>>>
>>> @ Ada Ma: My dataset consists of more than two siblings per family
>>> (one line for each person). I am not sure how to find out which
>>> siblings to be included in the dataset, if more than two siblings are
>>> being compared. E.g. a family consists of children age 1, 4, and 8 (so
>>> who should stay in the dataset). So that is why I choose only to
>>> include persons with one siblings.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/