Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: joining strings
From
Kevin McConeghy <[email protected]>
To
[email protected]
Subject
Re: st: joining strings
Date
Wed, 28 Nov 2012 16:50:04 -0600
thanks both ways work fine :)
Kevin
On Wed, Nov 28, 2012 at 4:34 PM, William Gould, StataCorp LP
<[email protected]> wrote:
> Kevin McConeghy <[email protected]> asked,
>
>> I have a dataset with ~2.2 mill obs like so:
>>
>> id stringvar + other variables
>> 1 x
>> 1 y
>> 1 z
>> 2 a
>> 3 d
>> 4 g
>> 4 h
>>
>> [...]
>> I was trying to combine the stringvar to collapse and make id a unique
>> key, like so:
>>
>> id stringvar
>> 1 xyz
>> 2 a
>> 3 d
>> 4 gh
>>
>> [...] [-reshape- ran out of memory] [...]
>>
>>
>> Is there some way to skip the reshape step [...]?
>
> Here is my solution. First, let me set up the toy problem,
>
> . clear all
>
> . input id str1 stringvar
>
> id stringvar
> 1. 1 x
> 2. 1 y
> 3. 1 z
> 4. 2 a
> 5. 3 d
> 6. 4 g
> 7. 4 h
> 8. end
>
> My solution is,
>
> . sort id
> . gen str result = ""
> . by id: replace result = result[_n-1] + stringvar
> . by id: keep if _n==_N
>
> Below I run that, with a few -list-s added:
>
> . sort id
>
> . gen str result = ""
> (7 missing values generated)
>
> . by id: replace result = result[_n-1] + stringvar
> (7 real changes made)
>
> . list
>
> +------------------------+
> | id string~r result |
> |------------------------|
> 1. | 1 x x |
> 2. | 1 y xy |
> 3. | 1 z xyz |
> 4. | 2 a a |
> 5. | 3 d d |
> |------------------------|
> 6. | 4 g g |
> 7. | 4 h gh |
> +------------------------+
>
> . by id: keep if _n==_N
> (3 observations deleted)
>
> . list
>
> +------------------------+
> | id string~r result |
> |------------------------|
> 1. | 1 z xyz |
> 2. | 2 a a |
> 3. | 3 d d |
> 4. | 4 h gh |
> +------------------------+
>
>
> In my solution,
>
> . sort id
> . gen str result = ""
> . by id: replace result = result[_n-1] + stringvar
> . by id: keep if _n==_N
>
> watch out for the first line, -sort id-. It should really read,
>
> . sort id some_other_variable
>
> We need to specify the order within equal values of id to make
> the the order of the letters deterministic. Perhaps Kevin want
> the letters is alphabetical order, in which case -sort id- should
> change to -sort id stringvar-.
>
> -- Bill
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Kevin McConeghy, PharmD
Infectious Diseases Fellow
University of Illinois College of Pharmacy
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/