Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: joining strings


From   "William Gould, StataCorp LP" <[email protected]>
To   [email protected]
Subject   Re: st: joining strings
Date   Wed, 28 Nov 2012 16:34:12 -0600

Kevin McConeghy <[email protected]> asked, 

> I have a dataset with ~2.2 mill obs like so:
> 
>       id        stringvar    + other variables
>        1          x
>        1          y
>        1          z
>        2          a
>        3          d
>        4          g
>        4          h
> 
> [...] 
> I was trying to combine the stringvar to collapse and make id a unique
> key, like so:
> 
>       id        stringvar
>        1          xyz
>        2          a
>        3          d
>        4          gh
>
> [...] [-reshape- ran out of memory] [...]
>
> 
> Is there some way to skip the reshape step [...]?

Here is my solution.  First, let me set up the toy problem, 

        . clear all

        . input id str1 stringvar

                    id  stringvar
          1. 1 x
          2. 1 y
          3. 1 z
          4. 2 a
          5. 3 d
          6. 4 g
          7. 4 h
          8. end

My solution is, 

        . sort id
        . gen str result = ""
        . by id: replace result = result[_n-1] + stringvar
        . by id: keep if _n==_N

Below I run that, with a few -list-s added:

        . sort id

        . gen str result = ""
        (7 missing values generated)

        . by id: replace result = result[_n-1] + stringvar
        (7 real changes made)

        . list

             +------------------------+
             | id   string~r   result |
             |------------------------|
          1. |  1          x        x |
          2. |  1          y       xy |
          3. |  1          z      xyz |
          4. |  2          a        a |
          5. |  3          d        d |
             |------------------------|
          6. |  4          g        g |
          7. |  4          h       gh |
             +------------------------+

        . by id: keep if _n==_N
        (3 observations deleted)

        . list

             +------------------------+
             | id   string~r   result |
             |------------------------|
          1. |  1          z      xyz |
          2. |  2          a        a |
          3. |  3          d        d |
          4. |  4          h       gh |
             +------------------------+


In my solution, 

        . sort id
        . gen str result = ""
        . by id: replace result = result[_n-1] + stringvar
        . by id: keep if _n==_N

watch out for the first line, -sort id-.  It should really read, 

        . sort id some_other_variable

We need to specify the order within equal values of id to make 
the the order of the letters deterministic.  Perhaps Kevin want 
the letters is alphabetical order, in which case -sort id- should 
change to -sort id stringvar-.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index