Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Labeling different kinds of missing observations
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Labeling different kinds of missing observations
Date
Fri, 20 Apr 2012 18:16:31 +0100
Yes, yes, yes. One of the nicest things about this list is being able
to suggest something really simple that should solve your problem.
First, if you -reshape long- you can make your missings explicit by
. fillin ID year
See -help fillin-. Also.
SJ-5-1 dm0011 . . . . . . . . . . . . . . Stata tip 17: Filling in the gaps
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q1/05 SJ 5(1):135--136 (no commands)
tips for using fillin to fill in gaps in a rectangular
data structure
Second, in long form indicators are also easy, e.g. for a generic
-panelid-, time variable -time- and response variable -y- you can get
the time of the first non-missing value and then associated indicators
bysort panelid : egen first = min(time / !missing(y))
gen missbefore = time < first
gen missafter = missing(y) & (time > first)
What's typical is that
loops to do something across observations (rows)
become
one- or two- liners to do something in panels.
As before, I am not happy about replacing missings with indicators.
On Fri, Apr 20, 2012 at 5:28 PM, Rituparna Basu <[email protected]> wrote:
> Thank you Nick once again, and I will try out the code and inform you.
>
> The reason I am not using long form is the following:
>
> My original data is in long form and looks like:
>
> ID Year Presence
> 1 06 x
> 1 07 x
> 1 08 x
> 2 06 x
> 2 08 x
> 3 06 x
> 3 07 x
> 3 08 x
> 3 09 x
> 4 09 x
> 4 10 x
> 4 11 x
>
> Meaning that it is an unbalanced panel data from the year 2006 to 2011.
>
> And if I reshape it this is how it will look (sort of):
>
> ID Y1 Y2 Y3 Y4 Y5 Y6 Y7
> 1 . . x x x x x
> 2 . x . x x x x
> 3 . . x x . x x
> 4 . x . x x . x
> 5 x x . x x x .
>
> The IMPORTANT thing here is that the missing here means something: either they did not begin the study or came back after a gap of 1-2 or more years. Hope it makes sense and answers your concern. So, having said this, do you think it is possible to the similar iteration using LONG form?
>
> Thanks again!
>
> Regards,
>
> RB
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Friday, April 20, 2012 1:09 AM
> To: [email protected]
> Subject: Re: st: Labeling different kinds of missing observations
>
> You are correct. When you asked for code to generate a variable, I did not understand that you want to replace a variable.
>
> Also, there is an error in the code I posted (_not_ a macro in Stata terms). References to -`first'- were spurious: should have been just -first-. Sorry about that.
>
> But the -first- variable it creates is still relevant.
>
> clear
>
> input ID Y1 Y2 Y3 Y4 Y5 Y6 Y7
> 1 . . 2 3 4 5 6
> 2 . 7 . 8 9 10 11
> 3 . . 12 13 . 14 15
> 4 . 16 . 17 18 . 19
> 5 20 21 . 22 23 24 .
> end
>
> gen missbefore = 0
> gen missafter = 0
> gen first = .
>
> qui forval J = 1/7 {
> replace missbefore = 1 if missing(Y`J') & `J' < first
> replace first = `J' if missing(first) & !missing(Y`J')
> replace missafter = 1 if missing(Y`J') & `J' > first }
>
> drop miss*
>
> qui forval J = 1/7 {
> replace Y`J' = cond(`J' < first, 0, 1) if missing(Y`J') }
>
> list
>
> That said, this sounds like a bad idea.
>
> 1. If 1 and 0 are in principle possible non-missing values it is a very bad idea.
>
> 2. Even if not, you need to remember to exclude the 0s and 1s from many, if not most, calculations with these variables.
>
> 3. Extended missing values (.a, .b, etc.) sound like what you really need here.
>
> My question "Why not -reshape long-?" still stands.
>
> Nick
>
> On Fri, Apr 20, 2012 at 7:52 AM, Rituparna Basu <[email protected]> wrote:
>> Hi Nick,
>>
>> Thank you so much for the resources and the code.
>> I did run the macro but it said 'invalid syntax'.
>>
>> I think I did not mention my question properly. I would like to transform the following data :
>> ID Y1 Y2 Y3 Y4 Y5 Y6 Y7
>> 1 . . x x x x x
>> 2 . x . x x x x
>> 3 . . x x . x x
>> 4 . x . x x . x
>> 5 x x . x x x .
>>
>> Transform to:
>>
>> ID Y1 Y2 Y3 Y4 Y5 Y6 Y7
>> 1 0 0 x x x x x
>> 2 0 x 1 x x x x
>> 3 0 0 x x 1. x x
>> 4 0 x 1 x x 1 x
>> 5 x x 1 x x x 1
>>
>> Basically, replace the missing of var Y* (missing obs before and after the first obs (as you can see)) and not create a new variable.
>> I apologize for the confusion but any help is greatly appreciated!
>>
>> Thank you!
>>
>> Regards,
>> RB
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Nick Cox
>> Sent: Thursday, April 19, 2012 12:15 PM
>> To: [email protected]
>> Subject: Re: st: Labeling different kinds of missing observations
>>
>> This sounds as if you want indicators
>>
>> missbefore 1 if any missing before first non-missing and 0 otherwise
>>
>> missafter 1 if any missing after etc.
>>
>> Here's a sketch. Code not tested.
>>
>> gen missbefore = 0
>> gen missafter = 0
>> gen first = .
>>
>> qui forval J = 1/7 {
>> replace missbefore = 1 if missing(Y`J') & `J' < `first'
>> replace first = `J' if missing(first) & !missing(Y`J')
>> replace missafter = 1 if missing(Y`J') & `J' > `first'
>> }
>>
>> I think of these problems in this way.
>>
>> 1. I need to initialise an indicator. Sometimes the initial value does not matter; sometimes it does. You have to think it through for each problem.
>>
>> 2. I need to loop over the variables.
>>
>> 3. The first key then is "when do I change my mind?"
>>
>> 4. The second key is "if I change my mind, is the indicator then fixed, or may I need to update it?"
>>
>> But why not -reshape long-?
>>
>> See also
>>
>> SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata:
>> Rowwise
>> (help rowsort, rowranks if installed) . . . . . . . . . . . N.
>> J. Cox
>> Q1/09 SJ 9(1):137--157
>> shows how to exploit functions, egen functions, and Mata
>> for working rowwise; rowsort and rowranks are introduced
>>
>> Nick
>>
>> On Thu, Apr 19, 2012 at 7:10 PM, Rituparna Basu <[email protected]> wrote:
>>
>>> I am trying to generate a variable that will indicate missing BEFORE FIRST YEAR of OBSERVATION and missing AFTER FIRST YEAR of OBSERVATION.
>>> Here is a sample of the data:
>>>
>>> ID Y1 Y2 Y3 Y4 Y5 Y6 Y7
>>> 1 . . x x x x x
>>> 2 . x . x x x x
>>> 3 . . x x . x x
>>> 4 . x . x x . x
>>> 5 x x . x x x .
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/