Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Problems with expand og reverting to original dataset
Nick Cox <[email protected]>
[email protected]
Re: st: Problems with expand og reverting to original dataset
Mon, 24 Jan 2011 09:23:46 +0000
You wrote that the error message disappeared on using
by mother_id father_id, sort
instead of
bysort mother_id father_id
These two are equivalent. Whatever removed your error was some other
change, I believe.
On Mon, Jan 24, 2011 at 9:11 AM, Grethe Søndergaard
<[email protected]> wrote:
> Thanks a lot to both of you for your explanations of how to handle my data.
> I am using cox-regression and the bysort command is so much easier
> than using expand as I intended to do. The error message disappeared
> when I wrote by mother_id father_id, sort (instead of bysort mother_id
> father_id).
> I am aware that choosing only two siblings from each family might be
> problematic and I will consider using reshape to include more
> siblings.
> 2011/1/20 Nick Cox <[email protected]>:
>> Let me explain why this suggestion is wrong and neither equivalent to, nor an improvement on, what I wrote.
>> My code was
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[2] - birth_date[1]
>> Within groups defined by the same mother and father, two siblings define two observations. Given sorting within same parents by -birth_date-, the first observation within each group is that with the lower birth_date and the second is that with the higher birth_date. With twins, defined precisely here as those born on the same day, the ordering is arbitrary but that is immaterial as the difference is 0 either way.
>> Ada wants to correct this to
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1] -birth_date
>> For observation 1, Ada's code reduces to mine. But for observation 2, it reduces to
>> ... birth_date[3] - birth_date[2]
>> As birth_date[3] refers to an observation outside each group, it will be evaluated as missing, and the value for the new variable will also be missing.
>> Hence this correction is incorrect. The literal subscripts [2] and [1] were precisely what was intended and what are needed to make this work.
>> Nick
>> [email protected]
>> Ada Ma
>> WRT your Q to Nick the command you should write is:
>> bysort mother_id father_id (birth_date) : gen diff = birth_date[_n+1]
>> -birth_date
>> [...]
>> On Thu, Jan 20, 2011 at 2:12 PM, Grethe Søndergaard
>> <[email protected]> wrote:
>>> Thank you for your answers
>>> @ Nick Cox: I have tried to run bysort mother_id father_id
>>> (birth_date) : gen diff = birth_date[2] -birth_date[1]. However, an
>>> error message appear: "factor variables and time-series operators not
>>> allowed". Can I solve this problem - by somehow changing the type of
>>> variable that birth_date is?
>>> @ Ada Ma: My dataset consists of more than two siblings per family
>>> (one line for each person). I am not sure how to find out which
>>> siblings to be included in the dataset, if more than two siblings are
>>> being compared. E.g. a family consists of children age 1, 4, and 8 (so
>>> who should stay in the dataset). So that is why I choose only to
>>> include persons with one siblings.
* For searches and help try: