Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AW: st: weird behavior of append
From
"Klaus Pforr" <[email protected]>
To
<[email protected]>
Subject
AW: st: weird behavior of append
Date
Wed, 12 Sep 2012 17:35:01 +0200
<>
I also don't see the problem. Figuratively spoken, in the first append step
you put the first data junk A atop of the second data junk B.
id other
A X
B X
So you have N_A+N_B observations and |Union(A,B)| variables
In the second (or better third) append step you put at least the context of
id in different data columns
id id2 other
A . X
. B X
So you still have N_A+N_B observations. The number of variables should now
be |Union(A,B)|+1.
Sorry, no miracles here, I think...
Best
Klaus
__________________________________
Klaus Pforr
GESIS -- Leibniz Institut für Sozialwissenschaft
B2,1
Postfach 122155
D - 68072 Mannheim
Tel: +49 621 1246 298
Fax: +49 621 1246 100
E-Mail: [email protected]
__________________________________
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Joerg Luedicke
Gesendet: Mittwoch, 12. September 2012 17:21
An: [email protected]
Betreff: Re: st: weird behavior of append
I cannot spot a problem here? You have 11165 observations in one file and
259 observations in the other. Then 11165 + 259 = 11424 observations is what
you end up with after appending?
Joerg
On Wed, Sep 12, 2012 at 9:50 AM, Feiveson, Alan H. (JSC-SK311)
<[email protected]> wrote:
> Hello - In Stata 12 IC, I am trying to append a file of 259 observations
to one of 11165 observations. Both files contain only one variable named
"id" (see below). After appending, rather than having 259 new observations,
it appears that 259 observations have been lost, yet if I reduce the size of
the first file to 10000, the append seems to work. Also if the variables
have different names, I get even more weird results (see below). Anyone have
an explanation?
>
> Thanks,
>
> Al Feiveson
>
> ======================================================================
> ==
> . use temp1,clear
> . summ
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> id | 11165 5205.994 98.91063 5000 5389
>
> . use temp2,clear
> . summ
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> id | 259 5206.846 101.6719 5000 5388
>
> . append using temp1
> . summ
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> id | 11424 5206.013 98.9696 5000 5389
>
>
> ======================================================================
> ==
> Now cut out some observations
>
> . use temp1,clear
> . keep in 1/10000
> (1165 observations deleted)
>
> . summ
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> id | 10000 5189.761 91.46947 5000 5331
>
> . append using temp2
> . summ
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> id | 10259 5190.192 91.77468 5000 5388
>
> This appears to be correct.
>
> ======================================================================
> == Now rename the variable in one of the files . use temp2,clear . des
>
> Contains data from temp2.dta
> obs: 259
> vars: 1 12 Sep 2012 09:32
> size: 518
>
----------------------------------------------------------------------------
------------------
> storage display value
> variable name type format label variable label
>
----------------------------------------------------------------------------
------------------
> id int %10.0g ID
> ----------------------------------------------------------------------
> ------------------------
> Sorted by: id
>
> . rename id id2
> . append using temp1
> . summ
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> id2 | 259 5206.846 101.6719 5000 5388
> id | 11165 5205.994 98.91063 5000 5389
>
> . count if id==. & id2<.
> 259
>
> . count if id2==. & id<.
> 11165
>
> So it appears that there should be 259 + 11165 observations, since
> both conditions are exclusive. Yet
>
>
> . des
>
> Contains data from temp2.dta
> obs: 11,424
> vars: 2 12 Sep 2012 09:32
> size: 45,696
>
----------------------------------------------------------------------------
------------------
> storage display value
> variable name type format label variable label
>
----------------------------------------------------------------------------
------------------
> id2 int %10.0g ID
> id int %10.0g ID
> ----------------------------------------------------------------------
> ------------------------
> Sorted by:
> Note: dataset has changed since last saved
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/