Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: AW: combining datasets

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	AW: st: AW: combining datasets
Date	Thu, 19 Aug 2010 18:30:35 +0200

<> 

" The choice between append and merge is more important for large datasets
because you need the right variable naming scheme."



I do not really understand the meaning of this sentence. Why would the
situation change given the size of the dataset at hand?

-append- and -merge- are not slight variations of each other, IMHO. The
manual entry for -merge- does make clear the many variations _within_
-merge- itself, but the choice between -append- and -merge- is more
fundamental still...

Also note [D], p. 397:

" merge is for adding new variables from a second dataset to existing
observations. You use
merge, for instance, when combining hospital patient and discharge datasets.
If you wish to add new
observations to existing variables, then see [D] append. You use append, for
instance, when adding
current discharges to past discharges."


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Anders
Alexandersson
Gesendet: Donnerstag, 19. August 2010 17:56
An: [email protected]
Betreff: Re: st: AW: combining datasets

Martine,

Also see [U] 22 Combining datasets. Maarten provided an excellent
append solution with this being the main line:
. append using `a'

Here is the equivalent merge solution:
. merge 1:1 source id using `a', nogen

The choice between append and merge is more important for large
datasets because you need the right variable naming scheme.
Michael Mitchell gave a good tip in his data management book described
at http://www.stata.com/bookstore/dmus.html :
If you will append datasets, you want the variable names to be the same,
but if you will merge datasets, you want the variable names to be different.

Anders Alexandersson
[email protected]

On Thu, Aug 19, 2010 at 4:34 AM, Maarten buis <[email protected]>
wrote:
> --- On Wed, 18/8/10, martine etienne wrote:
>> firstly, person 1 in dataset A is NOT same person as person
>> 1 in dataset B, measurements are also taken at different times
>> secondly, I would like the final dataset to look like Final 1
>
> Here is an example of how to do that:
>
> *------------ begin example ------------
> // create the two datasets
> tempfile a b
>
> drop _all
> input id x
> 1  3
> 2  4
> end
> save `a'
>
> drop _all
> input id x
> 1  5
> 2  6
> end
> save `b'
>
> // create a new variable in each dataset
> // that identifies the source of those
> // observations
> use `a'
> gen source = "a"
>
> save `a', replace
>
> use `b'
> gen source = "b"
> save `b', replace
>
> // use -append- to stack the datasets
> append using `a'
>
> // create a extra id variable, which contains
> // an unique integer for each source-id combination
> // and attaches the values of the source and id
> // variables to the value label
> egen long new_id = group(source id), label
>
> // for display purposes I put the thre id variables
> // to the left of the dataset
> order id source new_id
>
> // display the result
> list
> *--------------- end example ----------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: AW: combining datasets
  - From: martine etienne <[email protected]>
- Re: st: AW: combining datasets
  - From: Maarten buis <[email protected]>
- Re: st: AW: combining datasets
  - From: Anders Alexandersson <[email protected]>

Prev by Date: st: iweight (iw) behaves differently between ver. 10.1 and 11
Next by Date: Re: st: interpreting negative and positive AIC- OLS VS. GLM
Previous by thread: Re: st: AW: combining datasets
Next by thread: Re: st: AW: combining datasets
Index(es):
- Date
- Thread