Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Problems with expand og reverting to original dataset
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Problems with expand og reverting to original dataset
Date
Mon, 24 Jan 2011 09:23:46 +0000
You wrote that the error message disappeared on using
by mother_id father_id, sort
instead of
bysort mother_id father_id
These two are equivalent. Whatever removed your error was some other
change, I believe.
Nick
On Mon, Jan 24, 2011 at 9:11 AM, Grethe Søndergaard
<[email protected]> wrote:
> Thanks a lot to both of you for your explanations of how to handle my data.
> I am using cox-regression and the bysort command is so much easier
> than using expand as I intended to do. The error message disappeared
> when I wrote by mother_id father_id, sort (instead of bysort mother_id
> father_id).
> I am aware that choosing only two siblings from each family might be
> problematic and I will consider using reshape to include more
> siblings.
>
>
>
>
> 2011/1/20 Nick Cox <[email protected]>:
>> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
>>
>> My code was
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
>>
>> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
>>
>> Ada wants to correct this to
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
>>
>> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
>>
>> ... birth_date[3] - birth_date[2]
>>
>> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
>>
>> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
>>
>> Nick
>> [email protected]
>>
>> Ada Ma
>>
>> WRT your Q to Nick the command you should write is:
>>
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
>> -birth_date
>>
>> [...]
>>
>> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
>> <[email protected]> wrote:
>>> Thank you for your answers
>>>
>>> @ Nick Cox: I have tried to run bysort mother_id father_id
>>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
>>> error message appear: "factor variables and time-series operators not
>>> allowed". Can I solve this problem - by somehow changing the type of
>>> variable that birth_date is?
>>>
>>> @ Ada Ma: My dataset consists of more than two siblings per family
>>> (one line for each person). I am not sure how to find out which
>>> siblings to be included in the dataset, if more than two siblings are
>>> being compared. E.g. a family consists of children age 1, 4, and 8 (so
>>> who should stay in the dataset). So that is why I choose only to
>>> include persons with one siblings.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/