Hello all,
My problem should be simple. I have a file A, and two temporary files "`B'"
and "`C'".
I have two variables in A, call them a and b, by which I am merging file A
first with "`B'", then with "`C'". I do so because "`B'" and "`C'" consist
of four variables each: a, b, c and d. I want to get c and d into A. Files
"`B'" and "`C'" match separate subsets of the values of a and b, but the
matches add up exactly to the full set A, so at the end of these two simple
merge operations, A should be wider by two non-missing variables: c and d.
It doesn't work, and I don't see why. My code goes like this:
use A
preserve
(do stuff)
real screen shot follows, but first, some definitions:
variables a and b by which I am merging are emp_nr and mo_dt
variables c and d that I want are job_cls_cd and job_cls_stt_dt
tempfile "`B'" is "`hurts'"
tempfile "`C'" is "`classes_x'"
. restore
.
. drop job_cls_cd job_cls_stt_dt
. merge emp_nr mo_dt using "`hurts'"
. tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 746397 97.64 97.64
3 | 18060 2.36 100.00
------------+-----------------------------------
Total | 764457 100.00
.
. preserve
. keep if _merge==1
(18060 observations deleted)
. drop _merge
. keep emp_nr mo_dt
. egen u=tag(emp_nr mo_dt)
. keep if u==1
(97582 observations deleted)
. drop u
. sort emp_nr mo_dt
. tempfile classes_x
. merge emp_nr mo_dt using "`classes'"
mo_dt was byte now float
. tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
2 | 2184339 77.10 77.10
3 | 648815 22.90 100.00
------------+-----------------------------------
Total | 2833154 100.00
. keep if _merge==3
(2184339 observations deleted)
. drop _merge
. sort emp_nr mo_dt
. save "`classes_x'", replace
(note: file D:\TEMP\ST_0p003x.tmp not found)
file D:\TEMP\ST_0p003x.tmp saved
. describe
Contains data from D:\TEMP\ST_0p003x.tmp
obs: 648,815
vars: 4 27 Feb 2003 11:44
size: 11,678,670 (92.0% of memory free)
----------------------------------------------------------------------------
---
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
---
emp_nr long %12.0g EMP_NR
mo_dt float %9.0g
job_cls_cd str4 %9s JOB_CLS_CD
job_cls_stt_dt int %9.0g
----------------------------------------------------------------------------
---
Sorted by: emp_nr mo_dt
. count if job_cls_stt_dt==.
0
. count if job_cls_cd==""
0
*** Notice how this tempfile has all the stuff I want. Back to screen shot:
. restore
.
. drop _merge
. sort emp_nr mo_dt
. merge emp_nr mo_dt using "`classes_x'"
mo_dt was byte now float
. tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 18060 2.36 2.36
3 | 746397 97.64 100.00
------------+-----------------------------------
Total | 764457 100.00
. count if job_cls_stt_dt==.
746397
. count if job_cls_cd==""
746397
Does anybody have an idea why my _merge=3 matches won't produce the
job_cls_stt_dt and the job_cls_cd that are clearly present in the using
dataset?
Thanks,
Gabi
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/