On 2/22/07, Nick Cox <[email protected]> wrote:
Sebastian's approach is mine too, but it can be
done a little more directly.
Nick,
thank you for the hin. Just looking at my proposed solution I realize
that I introduced a temporary variable textgroup for actually no
reasons since I am not using it at all...
Regards
sebastian
> clear
> gen str15 text = ""
> input
> "some text"
> "Some Text"
> "SOME TEXT"
> "some other text"
> "some other text"
> "Some other text"
> "Some other text"
> "SoMe TeXt"
> "SoMe TeXt"
> "Some Other Text"
> end
> tempvar lotext
> tempvar textgrp
> tempvar comspelling
>
> gen `lotext'=lower(text)
> bys `lotext': gen `textgrp'=1 if _n==1
> replace `textgrp'=sum(`textgrp')
>
> bys `lotext' text: gen `comspelling'=_N
> bys `lotext' `comspelling': gen newtext=text[_N]
>
> I bet there are more elegant ways out in the wild and I am just
> looking forward to learn about them.
>
> Regards
> Sebastian
>
>
> On 2/22/07, Friedrich Huebler <[email protected]> wrote:
> > My data has string variables with text in uppercase or lowercase
> > letters. I would like to replace observations that are
> identical once
> > capitalization is ignored (e.g., "TEXT" and "text") by the most
> > common spelling. In some cases there are ties. So far I have only
> > managed to replace all such observations by their lowercase variant,
> > as in the example below. I am stumped and would appreciate
> any advice
> > on how I should proceed. I use Stata 8.2.
> >
> > Friedrich Huebler
> >
> > clear
> > gen str15 text = ""
> > input
> > "some text"
> > "Some Text"
> > "SOME TEXT"
> > "some other text"
> > "some other text"
> > "Some other text"
> > "Some other text"
> > "SoMe TeXt"
> > "SoMe TeXt"
> > "Some Other Text"
> > end
> > count
> > local n = r(N)
> > forvalues i = 1/`n' {
> > local t = lower(text[`i'])
> > replace text = "`t'" if lower(text) == "`t'"
> > }
> >
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/