[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: merge...

From	[email protected] (William Gould)
To	[email protected]
Subject	Re: st: merge...
Date	Tue, 02 Jul 2002 08:07:17 -0500

Alejandro Riano <[email protected]> writes, 

> Dear Statalisters, i have two databases, the first one has monthly data by
> country like this:
>
>       country     year    month     var1
>       Argentina   1989      1          x
>       Argentina   1989      2          x
>               .      .      .          .
>               .      .      .          .
>       Argentina   1999      12        x
>
>  for some set of countries (103 in total) and i have a second database which
>  has all the bilateral combinations among the countries that are included in
>  the first database (on a YEARLY basis), like this:
>
>       country1       country2          year      var2
>       Argentina      Brazil            1989       x
>       Argentina      Brazil            1990       x
>               .           .               .       .
>               .           .               .       .
>       Zambia         Zimbabwe          1999       x
>
> My problem is that i want to merge the var1 of the first database into the
> second one and in that way trasform my second bilateral base from a yearly
> frecuency to a monthly one. [...]

Let's call the two datasets monthly.dta and combo.dta.


Step 0:  Verify what Alejandro thinks is true is true
-----------------------------------------------------

        . use monthly, clear 
        . sort country year month
        . by country year month:  assert _N==1

        . use combo, clear 
        . sort country1 country2 year 
        . by country1 country2 year:  assert _N==1


Step 1:  Convert combo.dta to monthly format
--------------------------------------------

        . use combo, clear 
        . expand 12                        /* 12 months per year */
        . sort country1 country2 year
        . by country1 country2 year: gen month = _n
        . save combom


At this step we have not really "converted" the data to monthly format, who
knows what the value of var2 was for in January for (Argentiny, Brazil, 1989)?
Maybe is was var2 (maybe var2 is "countries in South America"), maybe as 
a first approximation is was var2/12 (maybe var2 is "total exports"), maybe 
something else.  I will leave that for Alejandro to figure out.


Step 2:  Merge on country1
--------------------------

        . use monthly
        . rename country country1
        . sort country1 year month
        . save, replace 

        . use combom, clear 
        . sort country1 year month
        . merge country1 year month using monthly, nokeep
        . save inprocess

        . drop _merge               /* but only after checking it */

(Technical note:  The -nokeep- option on merge produces the same result as if
we typed -merge- without the -nokeep- followed by -keep if _merge==1 |
_merge==3-.)

At this point, we just did a spread merge.  In the master data set (the data
set in memory), we had multiple observations for each (country1, year, month).
We had one (country1, year, month) observation in the using data.  The way
-merge- works, Stata spread the single observation (duplicated it) across all
the relevant (country1, year, month) observations.


Step 3:  Merge on country2
--------------------------

Our datasets has pairs of country, so now we must do the merge on the other 
country, too.  We must remember to rename the var1 variables in monthly.dta so
that they have a names in our final dataset:

        . use monthly, clear 
        . rename country1 country2
        . rename var1 var1_2
        . sort country2 year month
        . save monthly2, replace

        . use inprocess, clear 
        . sort country2 year month
        . merge country2 year month using monthly, nokeep

        . drop _merge               /* but only after checking it */



Step 4:  clean up 
-----------------

        . erase combom.dta
        . erase monthly2.dta
        . erase inprocess.dta


-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: time-series operator in -foreach- loop
Next by Date: Re: st: dealing with multiple alphanumeric responses
Previous by thread: st: RE: merge...
Next by thread: st: RE: dealing with multiple alphanumeric responses
Index(es):
- Date
- Thread