Re: st: RE: Re: Reshaping dataset

Yes, you're totally right! Joinby was a much better option! 
Many thanks for your help! 
Read the manual entry for merge carefully.
You most probably do NOT want to do a many to many merge (i.e. -merge m:m-).
Really unpredictable things can happen with that kind of merge and that's
probably what's causing your results to differ.
If cuci5d is not unique in one of your two datasets, you'll need to think
very carefully about what the merge should actually look like.  Depending on
what your data looks like and what your desired outcome is, -joinby- might

Yep, that was the problem with the command. Now that I was able to run the
whole set of commands, I get something quite weird, and I'm not really sure
about what step is causing it...

I get different results for "valrm" as I run and re-run the do-file...

I copy the syntax below, does anyone know what might be hapening?



set mem 1g

use "cadenas.dta"

sort cuci5d

save "cadenas.dta", replace


use "datos pry.dta"

sort cuci5d

merge m:m cuci5d using "cadenas.dta"

assert value==. if _merge==2

drop if _merge==2

drop _merge

save "prycadenas.dta", replace


use "usoecon.dta"

sort usoecon


use "prycadenas.dta"

merge m:1 usoecon using "usoecon.dta"

drop if _merge==1
drop if _merge==2

drop _merge

drop hs cuci5d flow usoecon

save "prycadenas.dta", replace

bysort year partner cadena subcadena flores: egen double svalue=sum(value)

bysort year partner cadena subcadena flores: keep if _n==1

drop value

reshape wide svalue, i(year cadena subcadena flores) j(partner)

gen double valmcs=svalue32+svalue76+svalue858

rename svalue0 valwld

gen valrm=valwld-valmcs


> On 26 April 2013 13:24, Andrea Molinari <> wrote:
>> Dear statalisters,
>> I´m working with a dataset which groups many dimensions and I´m 
>> having a little trouble reshaping the data for the (rather basic) 
>> calculations I need to do.
>> The dataset has the following columns:
>> year flow partner value cadena usoecon subcadena cadenacompartida1
>> subcadenacompartida1 cadenacompartida2 subcadenacompartida2
>> In order to regroup the data summing "value" by year, flow, cadena 
>> subcadena and usoecon, I need that:
>> - the values in cadenacompartida1 and cadenacompartida2 go under 
>> those in the column "cadena"
>> - the values in subcadenacompartida1 and "subcadenacompartida2"   go
>> under those in the column "subcadena"
>> To do so, I tried several options with -reshape long-, but I don´t 
>> seem to get the right reshaping to get the data in the way I need to 
>> then calculate:
>> bysort year flow cadena subcadena usoecon: egen double
>> svalue=sum(value)
>> Any ideas of those handling large datasets would be more than welcomed!
>> Cheers,
>> Andrea
