Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to fill in the missing data
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: How to fill in the missing data
Date
Mon, 10 Jun 2013 08:55:39 +0100
This approach is documented at
http://www.stata.com/support/faqs/data-management/replacing-missing-values/
but I agree with Sergiy: your problem is an interpolation problem.
Possible commands include -ipolate- (official), -cipolate- (SSC),
-csipolate- (SSC), -pchipolate- (SSC).
(I mention also -nnipolate- (SSC) for completeness, but it would not
be a good fit for your particular problem.)
Nick
[email protected]
On 10 June 2013 06:27, Sergiy Radyakin <[email protected]> wrote:
> Alexis, in your approach when you impute the weight you have a risk of
> carrying the weight of one patient to the next one, if the first
> measurement is missing for the second patient (your last line
> disregards ID). So unless it is known that the first measurement of
> weight is always present, (and we see from the provided example it is
> not the case) this method would create very incorrect results.
>
> Wong, are your datapoints such that each patientid-age combinations
> are unique? or do you sometimes see same patient twice within a year?
> (then be careful even with the -sort- statement).
>
> It sounds like interpolation is likely needed here since the intervals
> of missing observations are of different size and weight probably
> changes smoothly with age. But it shouldn't be difficult.
>
> Best, Sergiy
>
> On Mon, Jun 10, 2013 at 1:01 AM, Alexis Penot <[email protected]> wrote:
>> You can try this
>> sort id age
>> gen weight2 = weight
>> replace weight2 = weight2[_n-1] if missing(weight2)
>>
>> Alexis
>>
>> Le 10 juin 2013 à 06:45, Ching Wong <[email protected]> a écrit :
>>
>>> Hi,
>>>
>>> I have a dataset as following:
>>>
>>> id age weight
>>> 1 21 50.2
>>> 1 22
>>> 1 23 52.9
>>> 1 24 51.0
>>> 1 25
>>> 2 22
>>> 2 23
>>> 2 25 60.2
>>> 3 21
>>>
>>> And I would like to create a new variable "weight2" and fill in the
>>> missing data based on the previous value
>>>
>>> My expected output value should be as follows:
>>>
>>> id age weight weight2
>>> 1 21 50.2 50.2
>>> 1 22 . 50.2
>>> 1 23 52.9 52.9
>>> 1 24 51.0 51.0
>>> 1 25 . 51.0
>>> 2 22 . .
>>> 2 23 . .
>>> 2 25 60.2 60.2
>>> 3 21 . .
>>>
>>> I have tried the command below but that cannot produce what I expected.
>>>
>>> - bysort id (age): gen weight_hat = weight[_n-1]
>>>
>>> It is very obvious that command is missing something. So what will be
>>> the correct command in this case?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/