Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: extracting substrings from string, with irregular patterns

From	Fernando Luco <flucoestatalist@gmail.com>
To	statalist@hsphsun2.harvard.edu
Subject	st: extracting substrings from string, with irregular patterns
Date	Thu, 16 Aug 2012 13:27:53 -0500

Hi,

I have a dataset with one variable that contains the name of a gas
station, the address and the city in which the station is located. I
would like to separate all these in three different variables, name,
address and city. I have tried to use the regexs machinery but I
haven't been succesful. The data looks as follows

COPEC AV. 11 DE SEPTIEMBRE 000,Tocopilla
PETROBRAS Av. Antonio Rendic 6850,Antofagasta
TERPEL Basilio Urrutia esq. Janequeo 312,Lautaro
Sin Bandera carrera 348,Lautaro
Sin Bandera Isabel Riquielme 403,Villarrica

In the example the names are COPEC, PETROBRAS, TERPEL and Sin Bandera,
so there is a mixture of only uppercase and lowercase letters. The
addreses are written as: AV. 11 DE SEPTIEMBRE 000, Av. Antonio Rendic
6850, Basilio Urrutia esq Janequeo 312, carrera 348 and Isabel
Riquielme 403. Finally, the city is what follows the comma, so
Tocopilla, Antofagasta, Lautaro and Villarrica.

What I would like to do, even if it requires several steps, is to have
the name, address and city each as a different variable. I have tried
to separate everything by sub strings by spaces but it didn't work. I
also tried first recovering names in uppercase letters but it also
didn't work.

Finally, I have 1,600 stations so I would like to avoid doing this one
by one. Any suggestions?

Thanks,

Fernando
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: extracting substrings from string, with irregular patterns
  - From: Nick Cox <njcoxstata@gmail.com>

Prev by Date: Re: st: Use Regular Expressions to replace words/strings of characters in a text file
Next by Date: Re: st: extracting substrings from string, with irregular patterns
Previous by thread: st: Getting the 'egranger' command to use the entire sample
Next by thread: Re: st: extracting substrings from string, with irregular patterns
Index(es):
- Date
- Thread