You have a precision problem. By default -egen- will generate -float-
variables with the functions you are using. To keep every digit in the
integers you are playing with you need to spell out that you want a
-long- or -double-. There aren't enough bits in the variable type you
are using.
I can't follow your code which seems to go back and forth between string
and numeric results, nor do I know what MEPS means. I guess there's a
much simpler way to do what you want without using -egen- at all, but
the issue that is biting you is illustrated thus:
. set obs 1
obs was 0, now 1
. gen long myin = 40002015
. egen myout = max(myin)
. egen long myout2 = max(myin)
. format myout* %12.0f
. l
+--------------------------------+
| myin myout myout2 |
|--------------------------------|
1. | 40002015 40002016 40002015 |
+--------------------------------+
Nick
[email protected]
Matt Rutledge
Using Stata 10, I'm attempting to assign one person's identifier
(DUPERSID from the MEPS dataset) to every person in the sample,
repeating for each of the N people in my sample. The code I'm using
seems to work, except that it spontaneously changes one digit of the
identifier.
To illustrate, I've created this dummy dataset:
dupersid date x
40002015 19990101 1
40002015 19990201 0
40002015 19990301 0
40010010 19990101 0
40010010 19990201 1
40010010 19990301 1
41011144 19990101 1
41011144 19990201 0
41011144 19990301 1
and called it test.txt.
I then read in this dataset, and attempt to assign each observation
the dupersid 40002015. In turn, I'll also want to assign all of them
the identifier 40010010, and finally 41011144. So I do a forvalues
loop:
set more off
insheet using test.txt, names
tostring dupersid, replace
rename dupersid dupersidsave
bysort dupersidsave: gen first = 1 if _n==1
replace first = 0 if first==.
summ first
local N = r(N)*r(mean)
forvalues j = 1/`N' {
preserve
gsort -first dupersidsave
gen dupers = dupersidsave if _n==`j' & first==1
destring dupers, replace
egen dupersid = max(dupers)
tostring dupersid, replace
gsort dupersidsave -first
list dupers*
des
restore
}
****
Here's the output. Please note that on the first pass through the
loop, the identifier changes from 40002015 to 40002016. On the second
pass, the identifier changes from 40010010 to 40010008. The third
pass is fine. Any ideas why this might be? Using "egen, total" or
"egen, mean" doesn't seem to help, nor does destringing the identifier
at different points along the way. Also, I get the same error running
it without a loop (replace `j' with 1, for instance, and the
identifier still spontaneously changes).
(8 real changes made, 8 to missing)
dupers already numeric; no replace
dupersid was float now str8
+--------------------------------+
| dupers~e dupers dupersid |
|--------------------------------|
1. | 40002015 4.00e+07 40002016 |
2. | 40002015 . 40002016 |
3. | 40002015 . 40002016 |
4. | 40010010 . 40002016 |
5. | 40010010 . 40002016 |
|--------------------------------|
6. | 40010010 . 40002016 |
7. | 41011144 . 40002016 |
8. | 41011144 . 40002016 |
9. | 41011144 . 40002016 |
+--------------------------------+
Contains data
obs: 9
vars: 6
size: 261 (99.9% of memory free)
------------------------------------------------------------------------
-------
storage display value
variable name type format label variable label
------------------------------------------------------------------------
-------
dupersidsave long %12.0g
date long %12.0g
x byte %8.0g
first float %9.0g
dupers float %9.0g
dupersid str8 %9s
------------------------------------------------------------------------
-------
Sorted by: dupersidsave
Note: dataset has changed since last saved
(8 real changes made, 8 to missing)
dupers already numeric; no replace
dupersid was float now str8
+--------------------------------+
| dupers~e dupers dupersid |
|--------------------------------|
1. | 40002015 . 40010008 |
2. | 40002015 . 40010008 |
3. | 40002015 . 40010008 |
4. | 40010010 4.00e+07 40010008 |
5. | 40010010 . 40010008 |
|--------------------------------|
6. | 40010010 . 40010008 |
7. | 41011144 . 40010008 |
8. | 41011144 . 40010008 |
9. | 41011144 . 40010008 |
+--------------------------------+
Contains data
obs: 9
vars: 6
size: 261 (99.9% of memory free)
------------------------------------------------------------------------
-------
storage display value
variable name type format label variable label
------------------------------------------------------------------------
-------
dupersidsave long %12.0g
date long %12.0g
x byte %8.0g
first float %9.0g
dupers float %9.0g
dupersid str8 %9s
------------------------------------------------------------------------
-------
Sorted by: dupersidsave
Note: dataset has changed since last saved
(8 real changes made, 8 to missing)
dupers already numeric; no replace
dupersid was float now str8
+--------------------------------+
| dupers~e dupers dupersid |
|--------------------------------|
1. | 40002015 . 41011144 |
2. | 40002015 . 41011144 |
3. | 40002015 . 41011144 |
4. | 40010010 . 41011144 |
5. | 40010010 . 41011144 |
|--------------------------------|
6. | 40010010 . 41011144 |
7. | 41011144 4.10e+07 41011144 |
8. | 41011144 . 41011144 |
9. | 41011144 . 41011144 |
+--------------------------------+
Contains data
obs: 9
vars: 6
size: 261 (99.9% of memory free)
------------------------------------------------------------------------
-------
storage display value
variable name type format label variable label
------------------------------------------------------------------------
-------
dupersidsave long %12.0g
date long %12.0g
x byte %8.0g
first float %9.0g
dupers float %9.0g
dupersid str8 %9s
------------------------------------------------------------------------
-------
Sorted by: dupersidsave
Note: dataset has changed since last saved
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/