Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Fernando Luco <flucoestatalist@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: extracting substrings from string, with irregular patterns |
Date | Thu, 16 Aug 2012 13:27:53 -0500 |
Hi, I have a dataset with one variable that contains the name of a gas station, the address and the city in which the station is located. I would like to separate all these in three different variables, name, address and city. I have tried to use the regexs machinery but I haven't been succesful. The data looks as follows COPEC AV. 11 DE SEPTIEMBRE 000,Tocopilla PETROBRAS Av. Antonio Rendic 6850,Antofagasta TERPEL Basilio Urrutia esq. Janequeo 312,Lautaro Sin Bandera carrera 348,Lautaro Sin Bandera Isabel Riquielme 403,Villarrica In the example the names are COPEC, PETROBRAS, TERPEL and Sin Bandera, so there is a mixture of only uppercase and lowercase letters. The addreses are written as: AV. 11 DE SEPTIEMBRE 000, Av. Antonio Rendic 6850, Basilio Urrutia esq Janequeo 312, carrera 348 and Isabel Riquielme 403. Finally, the city is what follows the comma, so Tocopilla, Antofagasta, Lautaro and Villarrica. What I would like to do, even if it requires several steps, is to have the name, address and city each as a different variable. I have tried to separate everything by sub strings by spaces but it didn't work. I also tried first recovering names in uppercase letters but it also didn't work. Finally, I have 1,600 stations so I would like to avoid doing this one by one. Any suggestions? Thanks, Fernando * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/