Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Kevin McConeghy <kevinmcconeghy@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: joining strings |
Date | Wed, 28 Nov 2012 16:50:04 -0600 |
thanks both ways work fine :) Kevin On Wed, Nov 28, 2012 at 4:34 PM, William Gould, StataCorp LP <wgould@stata.com> wrote: > Kevin McConeghy <kevinmcconeghy@gmail.com> asked, > >> I have a dataset with ~2.2 mill obs like so: >> >> id stringvar + other variables >> 1 x >> 1 y >> 1 z >> 2 a >> 3 d >> 4 g >> 4 h >> >> [...] >> I was trying to combine the stringvar to collapse and make id a unique >> key, like so: >> >> id stringvar >> 1 xyz >> 2 a >> 3 d >> 4 gh >> >> [...] [-reshape- ran out of memory] [...] >> >> >> Is there some way to skip the reshape step [...]? > > Here is my solution. First, let me set up the toy problem, > > . clear all > > . input id str1 stringvar > > id stringvar > 1. 1 x > 2. 1 y > 3. 1 z > 4. 2 a > 5. 3 d > 6. 4 g > 7. 4 h > 8. end > > My solution is, > > . sort id > . gen str result = "" > . by id: replace result = result[_n-1] + stringvar > . by id: keep if _n==_N > > Below I run that, with a few -list-s added: > > . sort id > > . gen str result = "" > (7 missing values generated) > > . by id: replace result = result[_n-1] + stringvar > (7 real changes made) > > . list > > +------------------------+ > | id string~r result | > |------------------------| > 1. | 1 x x | > 2. | 1 y xy | > 3. | 1 z xyz | > 4. | 2 a a | > 5. | 3 d d | > |------------------------| > 6. | 4 g g | > 7. | 4 h gh | > +------------------------+ > > . by id: keep if _n==_N > (3 observations deleted) > > . list > > +------------------------+ > | id string~r result | > |------------------------| > 1. | 1 z xyz | > 2. | 2 a a | > 3. | 3 d d | > 4. | 4 h gh | > +------------------------+ > > > In my solution, > > . sort id > . gen str result = "" > . by id: replace result = result[_n-1] + stringvar > . by id: keep if _n==_N > > watch out for the first line, -sort id-. It should really read, > > . sort id some_other_variable > > We need to specify the order within equal values of id to make > the the order of the letters deterministic. Perhaps Kevin want > the letters is alphabetical order, in which case -sort id- should > change to -sort id stringvar-. > > -- Bill > wgould@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- Kevin McConeghy, PharmD Infectious Diseases Fellow University of Illinois College of Pharmacy * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/