[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data management issue (names listed differently)

From	"Eva Poen" <[email protected]>
To	[email protected]
Subject	Re: st: data management issue (names listed differently)
Date	Wed, 2 Jul 2008 17:33:25 +0100

Rufus,

say your spelling list is called list.dta, and your data are called
football.dta. list.dta has variables school and correctspelling.
football.dta has variable school , which has been cleaned up but
contains a mix of wrong and correct spellings, and others.

If you do a -merge- on school, it will yield missings in
correctspelling for all cases where there is no wrong spelling (i.e.
school has the correct spelling already). Then you just -replace
correctspelling = school if correctspelling == ""-, and you end up
with the desired result, I think (at least if the list is complete).

Eva



2008/7/2 Rufus Peabody <[email protected]>:
> Eva,
>
> Much thanks for the advice.  I am still wondering how I can merge with a
> variable that has a mixture of CorrectSpelling and WrongSpelling.  Cleaning
> it up manually is extremely time-consuming since there are thousands of
> observations.
>
> Thanks,
> Rufus
>
> On Jul 2, 2008, at 8:42 AM, Eva Poen wrote:
>
>> Rufus,
>>
>> are there too many schools/spellings to do it manually (i.e. -replace
>> school = "USC" if inlist(school, "Southern Cal","SouthCal")- )?
>>
>> In any case, I would recommend that you clean up your school variable
>> to make your task as easy as possible. That includes stripping of
>> leading/trailling blanks using -trim()-, and converting everything to
>> lower case (-lower()-). -itrim()- will reduce multiple, consecutive
>> internal blanks to one for you. All of this will help in reducing the
>> number of replacements you have to do.
>>
>> As a general strategy, you could compile a list (or data set) of all
>> the spellings you have, after cleaning up. If you go for a data set,
>> it could have two variables, CorrectSpelling and WrongSpelling. It
>> should then be possible to use -merge- to add the correct spelling to
>> data sets where the wrong spelling is present. For this to work you
>> need to make sure that there are no ambiguous wrong spellings, i.e.
>> abbreviations that may relate to more than one school.
>>
>> Hope this helps,
>> Eva
>>
>>
>>
>>
>> 2008/7/2 Rufus Peabody <[email protected]>:
>>>
>>> Hey all,
>>>
>>> I'm working with a dataset that contains a few variable containing the
>>> name
>>> of different college football teams.  The problem is, they are not
>>> spelled
>>> consistently (i.e. Miami(FL) and Miami Florida; USC and Southern Cal).
>>>  In
>>> many cases the spelling differs only in that there is an extra space
>>> after
>>> the school name for some.  What I'd like to do (and I'm pretty sure is
>>> possible) is create a master file with all the school names and possible
>>> spellings, which I can then somehow merge with my original dataset (and
>>> any
>>> future datasets with these teams) to create a consistent spelling.  How
>>> do I
>>> go about doing this? Specifically, if I have, say three variables
>>> containing
>>> spelling 1, spelling 2, and spelling 3 of a school, and I want to use
>>> spelling 1 in another dataset, how can I merge with a variable that has
>>> some
>>> schools with spellling 1 and others with spelling 2 or 3?
>>>
>>> Thanks a lot,
>>> Rufus
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/support/faqs/res/findit.html
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/support/faqs/res/findit.html
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: data management issue (names listed differently)
  - From: Rufus Peabody <[email protected]>
- Re: st: data management issue (names listed differently)
  - From: "Eva Poen" <[email protected]>
- Re: st: data management issue (names listed differently)
  - From: Rufus Peabody <[email protected]>

Prev by Date: Re: st: data management issue (names listed differently)
Next by Date: st: help! Strange error in Stata
Previous by thread: Re: st: data management issue (names listed differently)
Next by thread: Re: st: data management issue (names listed differently)
Index(es):
- Date
- Thread