Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Carry over information on time-invariant covariate to all observations of a household?
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: Carry over information on time-invariant covariate to all observations of a household?
Date
Tue, 03 Aug 2010 10:11:34 -0400
At 04:28 AM 8/3/2010, Jen Zhen wrote:
Dear Listers,
suppose I have a panel with dimensions Household and Year. I have the
time-invariant household characteristic X. Information on it is
currently given for each household in only one of the years, but I
would like to carry over this information to all observations of each
household.
To illustrate, the dataset looks like this:
HH Year X
1 1990 5
1 1991 .
1 1992 .
2 1990 .
2 1991 3
2 1992 .
3 1990 .
3 1991 .
3 1992 2
and I would like to fill in the missing values.
Currently my way of doing it is this:
- bysort HH: egen X2 = max(X) -
- replace X = X2 -
However, in a large dataset running this command takes forever, so I
am wondering whether there is a faster way to do this?
In addition to other advice given, you may want to look into
-carryforward- on SSC.
ssc desc carryforward
ssc inst carryforward
It will carry forward the latest value in the sort order (presumably
by HH and some other variable), whereas the advice given by Martin
Weiss will carry the lowest value in each group.
I noticed that the one observation (per HH group) that has a value is
not always the first. Your method needs to address whether you want
that value spread in both directions or just forward.
If you do use -carryforward-, you will need to decide on the sort
order, which might be HH X or HH Year. If it is HH X, it will carry
the greatest value forward, which is the same as the lowest if there
is only one observation per HH group having a value in X. If you use
HH Year, then it will carry values only into later years only -- not
into earlier years. If you want the value spread backwards as well,
you can follow it with a backward -carryforward-:
bysort HH (Year): carryforward X ...
gen int negyear = -year
bysort HH (negyear): carryforward X ...
This may be a more generally correct method, if there are any HHs
with more than one value for X.
But if you are certain that all HHs have only one value for X, then
the advice in Martin Weiss's first reply is correct and simplest.
HTH
--David
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/