Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: SV: RE: Splitting a textvariable
From
Sergiy Radyakin <[email protected]>
To
[email protected]
Subject
Re: st: SV: RE: Splitting a textvariable
Date
Wed, 12 May 2010 12:18:13 -0400
Dear Tomas,
this is not much of a programming problem, but more of a conceptual one.
Even though every address ends with a number, the reverse may not be true!
Not every number in your string may signal a new address. consider
Helsinki rroad 1 5 avenue 1600
(two addresses, second is "fifth avenue 1600"). With streets being numbered
all over North America this will definitely be a problem. Numbers can also be
a part of the more creative street names: "Square of the 1 of May" would be
a common occasion in the former USSR, my hometown in Ukraine has
"23rd of August" street as well as "Lisoparkovy 1-y in", "Lisoparkovy 2-y in".
http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&sll=44.197959,30.432129&sspn=11.561505,15.512695&ie=UTF8&ll=50.03433,36.223378&spn=0.002781,0.006958&z=18
While you may hope for the best and consider this is not the case with your
addresses, it is always safer to go back to the data provider and request
addresses to be stored in separate variables or separated with proper
terminators.
Best, Sergiy Radyakin
On Wed, May 12, 2010 at 11:24 AM, Tomas Lind <[email protected]> wrote:
> Thanks Nick and Bjarte for kind help and advice.
>
> I´ll going to learn more about regular expressions and -strpos()- and
> hopefully get it to work.
>
> /Tomas
>
>
>
> -----Ursprungligt meddelande-----
> Från: [email protected]
> [mailto:[email protected]] För Nick Cox
> Skickat: den 12 maj 2010 14:14
> Till: [email protected]
> Ämne: st: RE: Splitting a textvariable
>
> -inrange()- won't work as you want because the address is a string variable.
> I can think of two strategies here:
>
> 1. Use -split- to split into string and numeric components; then put them
> together again with a comma inserted.
>
> 2. Use regular expression functions.
>
> 3. Find the spaces with -strpos()- and stop when you reach the first space
> preceded by a number.
>
> There is some programming needed in each case.
>
> Nick
> [email protected]
>
> Tomas Lind
>
> I have (in principle) a dataset that looks like this with an id-variable and
> a text-variable with addresses. Unfortunately there are often several
> addresses in the same variable. I have to split these into one variable for
> the first address, another for the second and so on.
>
> id address
> ---------------------------------------------------------------
> 1 Stockholm xroad 12 London yroad10
> 2 London zroad 31
> 3 Helsinki rroad 1 Oslo sroad 123 Berlin troad 13
>
>
> Each address ends with a number. My idea is to put in a parsing "," in the
> empty space after the number (to be able to use -split parse-. Like this.
>
>
> id address
> ----------------------------------------------------------------
> 1 Stockholm xroad,12 London yroad10
> 2 London zroad 31,
> 3 Helsinki rroad 1,Oslo sroad123,Berlin troad 13
>
>
>
> One idea to do this is to use something like (if we rename the
> address-variable to a shorter name, x).
>
> replace x[i]="," if x[`i' - 1]==inrange(0, x[`i'], 9) & x[`i']==" "
>
> I´m not sure about how to work out the details to get this into work. Any
> help is welcome to put the commas into place. Other ideas are also welcome.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/