[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: wrong number of observations after append

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: wrong number of observations after append
Date	Tue, 3 Sep 2002 15:21:19 +0100

Karyen Chu
> 
> I have 50 individual-specific datasets (ie only one subject ID per 
> dataset), all of which contain data on the same 20 
> variables.  The original 
> datasets were ASCII text and I read them into Stata, 
> creating 50 datasets 
> and ran basic descriptive stats for each of the 50 subjects.
> 
> I then appended all 50 datasets into 1 very large dataset 
> and discovered 
> that some of the subjects now have the wrong number of 
> observations!  Some 
> have too many observations and some have too few 
> observations although the 
> total number of observations for all 50 subjects is correct.
> 
> Specifically:
> 
> .     table subjid
> 
> ----------------------
>     subjid |      Freq.
> ----------+-----------
>      28722 |     19,296          (should have 12,299 obs)
>      50910 |     23,971          (should have 23,972 obs)
>      54476 |     69,222
>      87734 |     21,213
>     119669 |     59,245          (should have 59,244 obs)
>     123614 |      6,634
>     127871 |        419
>     130008 |     51,515
>     130722 |      2,155
>     162245 |     59,448
>     194574 |      7,872
>     209171 |      7,991          (should have 7,989 obs)
>     226711 |      1,417
>     228761 |      2,652
>     284310 |      1,657
>     323652 |      2,267          (should have 2,269 obs)
>     326175 |     21,870
>     328958 |     30,081
>     360402 |      7,260
>     370576 |     15,429
>     371133 |        913
>     407487 |      5,820
>     413293 |      1,645
>     415301 |     39,116
>     417756 |      9,418
>     459852 |     14,024          (should have 14,023 obs)
>     462509 |      2,544
>     475134 |        567
>     476368 |     35,533
>     484595 |      4,792
>     487508 |     61,457
>     507428 |     18,677
>     564155 |     13,084
>     577895 |     31,010
>     580566 |      1,745
>     598037 |      9,369
>     666481 |     16,679          (should have 16,678 obs)
>     677056 |     22,647
>     717037 |     19,085
>     751384 |     27,639
>     763586 |      9,999
>     788300 |     32,728
>     828191 |     13,339
>     836796 |     11,495          (should have 11,494 obs)
>     876142 |      2,921
>     917942 |      9,432
>     929316 |     10,659
>     943493 |     14,256
>     955867 |     31,104
>     968002 |        909
> ----------------------
> 
> 
> 
> My code:
> 
> use user_usage.X1.28722.dta, clear
> describe
> 
> foreach subj of numlist 50910 54476 87734 119669 123614 /*
> */ 127871 130008 130722 162245 194574 209171 226711 228761 
> 284310 323652 /*
> */ 326175 328958 360402 370576 371133 407487 413293 415301 
> 417756 459852 /*
> */ 462509 475134 476368 484595 487508 507428 564155 577895 
> 580566 598037 /*
> */ 666481 677056 717037 /*
> */ 751384 763586 788300 828191 836796 876142 /*
> */ 917942 929316 943493 955867 968002  {
> 
>          append using user_usage.X1.`subj'.dta
> 
>      capture noisily save user_usage.X1.merge.dta, replace
> 
>      }
> 
> compress
> 
> sort subjid startedate start_hr start_min start_sec /*
>   */ endedate end_hr end_min end_sec
> 
> save user_usage.X1.merge.dta, replace
> 
> describe
> 
> table subjid
> 
> .........
> 
> 
> Because the number of observations is so large for many of 
> these subjects, 
> I'm not sure how to go about looking to see which 
> observations got dropped etc.
> 
> I am using StataSE 7.0 on a Windows 2000 machine with 384mb RAM
> the StataSE 7.0 executable is dated 11 Jun 2002
> and the ado files are dated 9 Aug 2002

I don't have a clue what's happening, but 
in broad terms, either you are misunderstanding 
your data files, or there is a bug somewhere, 
or both. To be frank, I'd guess the first.

Follow-Ups:
- st: Summing up values specific for each individual.
  - From: "Kompal Sinha" <[email protected]>

References:
- st: wrong number of observations after append
  - From: Karyen Chu <[email protected]>

Prev by Date: [no subject]
Next by Date: [no subject]
Previous by thread: st: wrong number of observations after append
Next by thread: st: Summing up values specific for each individual.
Index(es):
- Date
- Thread