Rufus,
say your spelling list is called list.dta, and your data are called
football.dta. list.dta has variables school and correctspelling.
football.dta has variable school , which has been cleaned up but
contains a mix of wrong and correct spellings, and others.
If you do a -merge- on school, it will yield missings in
correctspelling for all cases where there is no wrong spelling (i.e.
school has the correct spelling already). Then you just -replace
correctspelling = school if correctspelling == ""-, and you end up
with the desired result, I think (at least if the list is complete).
Eva
2008/7/2 Rufus Peabody <[email protected]>:
> Eva,
>
> Much thanks for the advice. I am still wondering how I can merge with a
> variable that has a mixture of CorrectSpelling and WrongSpelling. Cleaning
> it up manually is extremely time-consuming since there are thousands of
> observations.
>
> Thanks,
> Rufus
>
> On Jul 2, 2008, at 8:42 AM, Eva Poen wrote:
>
>> Rufus,
>>
>> are there too many schools/spellings to do it manually (i.e. -replace
>> school = "USC" if inlist(school, "Southern Cal","SouthCal")- )?
>>
>> In any case, I would recommend that you clean up your school variable
>> to make your task as easy as possible. That includes stripping of
>> leading/trailling blanks using -trim()-, and converting everything to
>> lower case (-lower()-). -itrim()- will reduce multiple, consecutive
>> internal blanks to one for you. All of this will help in reducing the
>> number of replacements you have to do.
>>
>> As a general strategy, you could compile a list (or data set) of all
>> the spellings you have, after cleaning up. If you go for a data set,
>> it could have two variables, CorrectSpelling and WrongSpelling. It
>> should then be possible to use -merge- to add the correct spelling to
>> data sets where the wrong spelling is present. For this to work you
>> need to make sure that there are no ambiguous wrong spellings, i.e.
>> abbreviations that may relate to more than one school.
>>
>> Hope this helps,
>> Eva
>>
>>
>>
>>
>> 2008/7/2 Rufus Peabody <[email protected]>:
>>>
>>> Hey all,
>>>
>>> I'm working with a dataset that contains a few variable containing the
>>> name
>>> of different college football teams. The problem is, they are not
>>> spelled
>>> consistently (i.e. Miami(FL) and Miami Florida; USC and Southern Cal).
>>> In
>>> many cases the spelling differs only in that there is an extra space
>>> after
>>> the school name for some. What I'd like to do (and I'm pretty sure is
>>> possible) is create a master file with all the school names and possible
>>> spellings, which I can then somehow merge with my original dataset (and
>>> any
>>> future datasets with these teams) to create a consistent spelling. How
>>> do I
>>> go about doing this? Specifically, if I have, say three variables
>>> containing
>>> spelling 1, spelling 2, and spelling 3 of a school, and I want to use
>>> spelling 1 in another dataset, how can I merge with a variable that has
>>> some
>>> schools with spellling 1 and others with spelling 2 or 3?
>>>
>>> Thanks a lot,
>>> Rufus
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/support/faqs/res/findit.html
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/support/faqs/res/findit.html
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/