| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: Repeating values
...
I think this will do what you want (assuming that you also have a variable
named company in your dataset):
gsort company -year
by company: gen todrop=sum(Var1==Var1[1] & Var1==Var1[_n+1])
by company: replace Var1=. if todrop=_n
drop todrop
The data are sorted in reverse time (as Nick mentioned) and then todrop is
created as a counter for the cumulative number of observations equal to the
first (final timewise) observation and also equal to the next observation
(prior timewise). Then, if the observation number equals todrop, it should
be changed to missing.
Michael Blasnik
----- Original Message -----
From: "Thomas Erdmann" <[email protected]>
To: <[email protected]>
Sent: Thursday, December 07, 2006 11:56 AM
Subject: st: Repeating values
Hi,
I am working with some variables that or "wrong" in the sense that if one
share was taken off the market (i.e. the company was dissolved), the last
value of the variable is repeated instead of containing missing values.
e.g.
Status Year Var1
Listed 1991 0.9
Listed 1992 0.95
Listed 1993 0.93
Delisted 1994 0.93
Delisted 1995 0.93
..
Delisted 2006 0.93 (value is always repeated up to present time)
Whereas years 1994-2006 should contain missing values. I came up with this
cleaning process:
foreach X of varlist var1 var2 var3 {
generate `X'new=`X'
replace `X'new=. if `X'==L.`X'
replace `X'=`X'new
drop `X'new
}
Which is okay, but also sets the value to missing if one observation for a
listed company repeats, so it also deletes observations that would be
fine.
Any suggestions on how I can only replace the "wrong" values?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/