Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Jorge Eduardo Pérez Pérez <perez.jorge@ur.edu.co> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: RE: Convert string with special characters to lower case |
Date | Wed, 25 Apr 2012 14:52:41 -0400 |
Great idea, it almost works,but -proper- is confused after the translation because it capitalizes letters if they are after special characters. In this example, the character "i" after "ñ" is capitalized: clear all input str15 nombre "ZÚÑIGA" end gen nombre2=subinstr(nombre,char(209),char(241),.) replace nombre2=subinstr(nombre2,char(218),char(250),.) gen nombre3=lower(nombre2) gen nombre4=proper(nombre2) li So I did this as a workaround: clear all input str15 nombre "ZÚÑIGA" end gen nombre2=subinstr(nombre,char(209),char(241),.) replace nombre2=subinstr(nombre2,char(218),char(250),.) gen nombre3=lower(nombre2) gen first=substr(nombre3,1,1) gen rest=substr(nombre3,2,.) replace first=upper(first) gen nombre_ok=first+rest li Thanks a lot! _______________________ Jorge Eduardo Pérez Pérez On Wed, Apr 25, 2012 at 2:04 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I think you need to set up a translation e.g. from char(209) to char(241) and _then_ apply -proper()-. > > How many problem characters are there? About 10? Sounds like a small program or -egen- function but at least the problem needs to be solved once only. > > Nick > n.j.cox@durham.ac.uk > > Jorge Eduardo Pérez Pérez > > I have some text in Spanish that includes accents and special > characters in uppercase, e.g > "ZUÑIGA", "RODRÍGUEZ" > and I would like to convert it to proper case. Stata 12.1 in a Windows > machine doesn't do it correctly: if I apply -proper- to the previous > text, I get: > "ZuÑiga", "RodrÍguez" > > The correct conversion could be achieved by getting the positions of > the accented characters, replacing the accented versions of the > characters with their unaccented versions using -subisntr- or -regex-, > converting it to proper case, and then putting the accents back using > the previous positions and -substr-. However, this would require some > loops over observations and over characters. It seems terribly > cumbersome and inefficient for large datasets. > > Does anyone know a better way to achieve this? > > Thank you, > ______________________ > Jorge Eduardo Pérez Pérez > > PD: I am hoping that the Spanish characters make it through the plain > text encoding and are displayed correctly when you get this e-mail. If > not, then probably this e-mail will not make much sense, specially to > speakers of languages without special characters. Sorry about that. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/