My data has string variables with text in uppercase or lowercase
letters. I would like to replace observations that are identical once
capitalization is ignored (e.g., "TEXT" and "text") by the most
common spelling. In some cases there are ties. So far I have only
managed to replace all such observations by their lowercase variant,
as in the example below. I am stumped and would appreciate any advice
on how I should proceed. I use Stata 8.2.
Friedrich Huebler
clear
gen str15 text = ""
input
"some text"
"Some Text"
"SOME TEXT"
"some other text"
"some other text"
"Some other text"
"Some other text"
"SoMe TeXt"
"SoMe TeXt"
"Some Other Text"
end
count
local n = r(N)
forvalues i = 1/`n' {
local t = lower(text[`i'])
replace text = "`t'" if lower(text) == "`t'"
}
____________________________________________________________________________________
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/