Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: getting part of strings
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: getting part of strings
Date
Sun, 27 Mar 2011 17:44:21 +0100
This is a tedious but not difficult conversion job so far as I can
see. For example, -asciiplot- on my machine shows char(192) through
char(196) as various accented upper case A. So, I would map those all
to A -- char(65). You don't need to store anything.
qui forval j = 192/196 {
replace myvar = subinstr(myvar, "`=char(`j')'", "A", .)
}
As Eric pointed out, -charlist- from SSC shows which characters there
are in your variable and -asciiplot- from SSC gives you a visual
table.
Nick
On Sun, Mar 27, 2011 at 4:57 PM, Daniel Marcelino <[email protected]> wrote:
> I get it. However this thread lead me to an old issue in my mind, how
> take out language marks (accent) from strings replacing by single
> letter, like "Ô" for "O" or "È" for "E".
> So, maybe I can store a local table with correspondence letters and
> run it in a loop for each line of string var. What you think about it?
>
> /****/
> clear
> inp str200 var1
> "45123 - ANTÔNIO HERVÁZIO BEZERRA CAVALCANTI - PB - Deputado Estadual"
> "1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal"
> end
>
> // table accent
> local accent = {
> ['á'] = 'a',
> ['à'] = 'a',
> ['ã'] = 'a',
> ['é'] = 'e',
> ['è'] = 'e',
> ['É'] = 'E',
> ['Ó'] = 'O',
> ['í'] = 'i',
> ['Í'] = 'I',
> ['ü'] = 'u',
> ['Ü'] = 'U',
> }
>
>
>
> On Sun, Mar 27, 2011 at 1:17 AM, Eric Booth <[email protected]> wrote:
>> <>
>>
>> On Mar 26, 2011, at 10:10 PM, Rebecca Pope wrote:
>>
>>> Daniel,
>>> You could try using char(). The ASCII equivalent to "A" is 69; for "Z"
>>> it is 90. Maybe something like this would work for you (piggy-backing
>>> on Nick's earlier suggestion):
>>>
>>> clonevar copy = var1
>>> replace copy = upper(copy)
>>> qui forval i = 69/90 {
>>> local letter = char(`i')
>>> replace copy = subinstr(copy, "`letter'", "", .)
>>> }
>>
>> Another option is to use c(alpha) and c(ALPHA) for standard alpha characters
>> ********modifying NJC's example:
>> clonevar copy = var1
>> qui foreach i in `c(alpha)' `c(ALPHA)' {
>> replace copy = subinstr(copy, "`i'", "", .)
>> }
>> *******
>>
>>>
>>> This won't work for all of your text (e.g. Ã). I don't know of any way
>>> to look the numeric values up in Stata, so I'll plug a previous post
>>> by Nick
>>> (http://www.stata.com/statalist/archive/2006-12/msg00446.html) and
>>> advise you to look up the ASCII codes for any accented letters by
>>> searching the internet for "ANSI character code chart". You'll need to
>>> modify the code above to add any additional numbers you need & switch
>>> to -foreach- with -numlist-.
>>
>> Take a look at -ascii- and -asciiplot- from SSC.
>> Also, you can get a list of all the chars used in var1 with -charlist- from SSC.
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/