Karyen Chu
>
> I have 50 individual-specific datasets (ie only one subject ID per
> dataset), all of which contain data on the same 20
> variables. The original
> datasets were ASCII text and I read them into Stata,
> creating 50 datasets
> and ran basic descriptive stats for each of the 50 subjects.
>
> I then appended all 50 datasets into 1 very large dataset
> and discovered
> that some of the subjects now have the wrong number of
> observations! Some
> have too many observations and some have too few
> observations although the
> total number of observations for all 50 subjects is correct.
>
> Specifically:
>
> . table subjid
>
> ----------------------
> subjid | Freq.
> ----------+-----------
> 28722 | 19,296 (should have 12,299 obs)
> 50910 | 23,971 (should have 23,972 obs)
> 54476 | 69,222
> 87734 | 21,213
> 119669 | 59,245 (should have 59,244 obs)
> 123614 | 6,634
> 127871 | 419
> 130008 | 51,515
> 130722 | 2,155
> 162245 | 59,448
> 194574 | 7,872
> 209171 | 7,991 (should have 7,989 obs)
> 226711 | 1,417
> 228761 | 2,652
> 284310 | 1,657
> 323652 | 2,267 (should have 2,269 obs)
> 326175 | 21,870
> 328958 | 30,081
> 360402 | 7,260
> 370576 | 15,429
> 371133 | 913
> 407487 | 5,820
> 413293 | 1,645
> 415301 | 39,116
> 417756 | 9,418
> 459852 | 14,024 (should have 14,023 obs)
> 462509 | 2,544
> 475134 | 567
> 476368 | 35,533
> 484595 | 4,792
> 487508 | 61,457
> 507428 | 18,677
> 564155 | 13,084
> 577895 | 31,010
> 580566 | 1,745
> 598037 | 9,369
> 666481 | 16,679 (should have 16,678 obs)
> 677056 | 22,647
> 717037 | 19,085
> 751384 | 27,639
> 763586 | 9,999
> 788300 | 32,728
> 828191 | 13,339
> 836796 | 11,495 (should have 11,494 obs)
> 876142 | 2,921
> 917942 | 9,432
> 929316 | 10,659
> 943493 | 14,256
> 955867 | 31,104
> 968002 | 909
> ----------------------
>
>
>
> My code:
>
> use user_usage.X1.28722.dta, clear
> describe
>
> foreach subj of numlist 50910 54476 87734 119669 123614 /*
> */ 127871 130008 130722 162245 194574 209171 226711 228761
> 284310 323652 /*
> */ 326175 328958 360402 370576 371133 407487 413293 415301
> 417756 459852 /*
> */ 462509 475134 476368 484595 487508 507428 564155 577895
> 580566 598037 /*
> */ 666481 677056 717037 /*
> */ 751384 763586 788300 828191 836796 876142 /*
> */ 917942 929316 943493 955867 968002 {
>
> append using user_usage.X1.`subj'.dta
>
> capture noisily save user_usage.X1.merge.dta, replace
>
> }
>
> compress
>
> sort subjid startedate start_hr start_min start_sec /*
> */ endedate end_hr end_min end_sec
>
> save user_usage.X1.merge.dta, replace
>
> describe
>
> table subjid
>
> .........
>
>
> Because the number of observations is so large for many of
> these subjects,
> I'm not sure how to go about looking to see which
> observations got dropped etc.
>
> I am using StataSE 7.0 on a Windows 2000 machine with 384mb RAM
> the StataSE 7.0 executable is dated 11 Jun 2002
> and the ado files are dated 9 Aug 2002
I don't have a clue what's happening, but
in broad terms, either you are misunderstanding
your data files, or there is a bug somewhere,
or both. To be frank, I'd guess the first.