Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: RE: carryforward
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: RE: RE: carryforward
Date
Mon, 23 Jan 2012 14:01:28 -0500
I never expected this to invoke such a reaction.
But then, I should have explained (and maybe I should explain in the
.hlp and the ssc description), that it was not intended as a method
of imputing missing values.
The intent is to fill in values in "holes", where a value is
understood to prevail until explicitly changed. This assumes that the
data are sorted such that the concept of "prevail until" makes sense;
typically this is time-based.
The scenario where I typically use this is where you have two or more
datasets that represent changes in different attributes over time --
say a person's salary and marital status. (Note that not all are
numeric.) Each dataset should be uniquely sorted on person-id and
date. But the changes may occur on different dates in the different
datasets. Also, these datasets should have non-missing values for the
pertinent variables.
The datasets are merged. This leaves holes where there was a change
in one attribute but not on the other attribute for a given date --
corresponding to unmatched records in the merge. E.g., if a salary
change occurred on a particular date, but not a change of marital
status, then the merged record would have a missing value for marital
status. And vice-versa. Then what you want is to carry the prevailing
value from one record to the next, until a nonmissing value is encountered.
You also want to interrupt the process when a new person_id is
encountered. Then you would use -by-:
by person_id (date): carryforward salary marital_status, replace
Finally, note that there may be instances where there are missing
values in the original data, and you would not want to carry values
into and through the corresponding merged records. (E.g., a missing
value in the salary dataset; there was a salary change on jan23,
2012, but you don't know what it was.) There are ways to handle that as well.
I hope this is helpful.
--David
At 12:58 PM 1/23/2012, Nick Cox wrote:
If this method is one of imputing missing values that in practice
will be varying by a constant that was the last observed value, then
as Tony implies it clearly can be problematic.
But the method of replacing missing values by previous non-missing
values is one I often use with small datasets entered by hand. When
the observations come in blocks, I only need to type in values for
the first identifier in each block, and then -replace- appropriately.
Sometimes datasets arrive like that too. Only the first value in a
block of some blocked variable is explicit, so you have to fill in
(or fill out) implied similar values.
Nick
[email protected]
Lachenbruch, Peter
This method has been seriously questioned and gives very poor
answers generally. A true p-value may be reported anywhere from
0.01 to0.15 when it should be 0.05. I strongly urge it not be used.
________________________________________
From: [email protected]
[[email protected]] On Behalf Of David Kantor
[[email protected]]
Once again, thanks to Kit Baum, a new version of -carryforward- is
available on SSC.
This upgrade adds the -if- and -in- qualifiers.
Actually, the upgrade was written a long time ago, but never got
uploaded until now. Sorry, if that was my fault.
-carryforward- carries values from one observation to the next,
filling in missing values.
-ssc install carryforward-
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/