The issue Devra raises can be answered by looking at the code. Here I
focus on the
-egen- add-on -first()-. The same issue arises with -lastnm()-. You
could see the code
below from within Stata by typing
. ssc type _gfirst.ado
What is wired into the code through the -marksample- statement is that
missings on the
variable supplied are segregated throughout. Thus, as Devra reports,
missings
are mapped to missings.
*! 1.0.0 NJC 31 May 2000
program define _gfirst
version 6.0
gettoken type 0 : 0
gettoken g 0 : 0
gettoken eqs 0 : 0
syntax varname [if] [in] [, BY(varlist) ]
marksample touse, strok
tempvar order
gen long `order' = _n
sort `touse' `by' `order'
* ignore user-supplied `type'
local type : type `varlist'
qui by `touse' `by' : gen `type' `g' = `varlist'[1] if `touse'
end
Below is a hack that would populate observations with appropriate
non-missings whenever
they exist.
*! 1.1.0 NJC Statalist 24 Feb 2008
* _gfirst 1.0.0 NJC 31 May 2000
program define _gfirst2
version 6.0
gettoken type 0 : 0
gettoken g 0 : 0
gettoken eqs 0 : 0
syntax varname [if] [in] [, BY(varlist) ]
marksample touse, strok novarlist
tempvar order missing
gen long `order' = _n
gen byte `missing' = missing(`varlist')
sort `touse' `by' `missing' `order'
* ignore user-supplied `type'
local type : type `varlist'
qui by `touse' `by' : gen `type' `g' = `varlist'[1] if `touse'
end
Alternatively, surgery after Devra's example would be
bysort id (y) : replace y = y[1]
bysort id (z) : replace z = z[1]
Nick
[email protected]
Devra Golbe
-egen(newvar) = first(varname)-
(from the egenmore functions) produces missing values when I did not
expect that behavior. newvar is missing for observations in which
varname is missing. The same is true for -egen newvar =
lastnm(varname)- Is that the behavior I should have expected? In
contrast, -egen (newvar) = mean(varname) populates newvar even if
varname is missing. See the example below my signature
input n id x
n id x
1. 1 1 10
2. 2 1 9
3. 3 1 11
4. 4 2 12
5. 5 2 .
6. 6 2 11
7. 7 3 .
8. 8 3 .
9. 9 3 10
10. end
. egen y = first(x), by(id)
(3 missing values generated)
egen z = lastnm(x), by(id)
(3 missing values generated)
egen m=mean(x), by(id)
. list
+------------------------------+
| n id x y z m |
|------------------------------|
1. | 1 1 10 10 11 10 |
2. | 2 1 9 10 11 10 |
3. | 3 1 11 10 11 10 |
4. | 4 2 12 12 11 11.5 |
5. | 5 2 . . . 11.5 |
|------------------------------|
6. | 6 2 11 12 11 11.5 |
7. | 7 3 . . . 10 |
8. | 8 3 . . . 10 |
9. | 9 3 10 10 10 10 |
+------------------------------+
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/