To Uli and Kit
:
Thank you for your advice.
Sorry for my late reply, as it took some time to verify
the uploaded codes.
------------------------------------------------
(Uli wrote)
> forvalues i = 1997/2001 {
> use "personnel_`i'"
> sort id
> gen year = `i'
> save "using", replace
> use personnel
> sort id
> capture drop _merge
> merge id using "using""
> save personnel, replace
> }
> erase "using.dta"
> * Prepare using data
> forvalues i = 1997/2001 {
> use "personnel_`i'"
> sort id
> gen year = `i'
> save "using_`i', replace
> }
>
> * Load Master Data
> use personnel
> sort id
>
> * Merge Data
> forvalues i = 1997/2001 {
> merge id using "using_`i'"
> drop _merge
> erase using_`i'.dta /* Clean up things */
> }
>
> save personnel_new, replace
I didn't know -forvalues- could handle files as well as
variables. Then, -foreach- is not required at all in this
case.
Of the two routines above, the latter one may be
better(with some modifications), as you suggested.
> Finally you might want to check out -mmerge- on SSC
> which makes merging much easier.
Thank you for your recommendation. I didn't know of
this module.
------------------------------------------------
(Kit wrote)
> Why do you want the time variables to be 'year1997', 'year1998' and so on?
> It would seem obvious that 'year1997' == 1997, or perhaps 1 if it is a
> dummy for year, but it would seem sufficient to have a single variable
> 'year' which contains the year of each chunk of data.
Maybe you are right.
I tried to create 'year1997', 'year1998' and so on simply
because I didn't know a better way.
> Further if these files are unit records -- e.g. one record per employee
> per year -- don't you want to append them, rather than merging them?
> Unless you have different variable names in the various files, merging
> them will not likely be what you want (and if you do, it is even less
likely
> to be what you want -- it will create a block-diagonal structure with
> mostly missing data). -append- would create a panel in 'personnel' with a
> t-indicator of year. Something like
>
> forv i=1997/2001 {
> use personnel_`i',clear
> gen year = `i'
> save personnel_`i',replace
> }
> use personnel,clear
> forv i=1997/2001 {
> append using personnel`i'
> }
> save personnel,replace
The structure of the data files at hand is as follows:
personnel_1997.dta personnel_1998.dta ... personnel_2001.dta
------------------------------------------------
id year id year id year
101818 1997 101818 1998 101818 2001
102270 1997 102881 1998 103454 2001
102881 1997 103454 1998 104043 2001
103241 1997 103586 1998 104124 2001
103373 1997 103799 1998 104132 2001
103438 1997 104043 1998 104710 2001
103454 1997 104124 1998 104868 2001
103586 1997 104132 1998 105210 2001
103799 1997 104281 1998 105635 2001
103888 1997 104523 1998 105708 2001
...
Namely, there are attrition and/or entry of employees
during the panel period, though most of them survive
from start to end.
So, I thought using -merge- with each file matched by id
variable would be a better choice.
I'd appreciate additional advice if I am wrong.
> Note, btw, that
>
> capture confirm variable _merge
> if _rc {
> continue
> }
> else {
> drop _merge
> }
>
> can be replaced by
>
> capture drop _merge
Oh, I didn't know such a simple code could handle errors
in Stata.
Error handling of this kind would be more complicated in
other script languages
K.I.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/