|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: data management issue (names listed differently)
Eva,
Much thanks for the advice. I am still wondering how I can merge with
a variable that has a mixture of CorrectSpelling and WrongSpelling.
Cleaning it up manually is extremely time-consuming since there are
thousands of observations.
Thanks,
Rufus
On Jul 2, 2008, at 8:42 AM, Eva Poen wrote:
Rufus,
are there too many schools/spellings to do it manually (i.e. -replace
school = "USC" if inlist(school, "Southern Cal","SouthCal")- )?
In any case, I would recommend that you clean up your school variable
to make your task as easy as possible. That includes stripping of
leading/trailling blanks using -trim()-, and converting everything to
lower case (-lower()-). -itrim()- will reduce multiple, consecutive
internal blanks to one for you. All of this will help in reducing the
number of replacements you have to do.
As a general strategy, you could compile a list (or data set) of all
the spellings you have, after cleaning up. If you go for a data set,
it could have two variables, CorrectSpelling and WrongSpelling. It
should then be possible to use -merge- to add the correct spelling to
data sets where the wrong spelling is present. For this to work you
need to make sure that there are no ambiguous wrong spellings, i.e.
abbreviations that may relate to more than one school.
Hope this helps,
Eva
2008/7/2 Rufus Peabody <[email protected]>:
Hey all,
I'm working with a dataset that contains a few variable containing
the name
of different college football teams. The problem is, they are not
spelled
consistently (i.e. Miami(FL) and Miami Florida; USC and Southern
Cal). In
many cases the spelling differs only in that there is an extra
space after
the school name for some. What I'd like to do (and I'm pretty sure
is
possible) is create a master file with all the school names and
possible
spellings, which I can then somehow merge with my original dataset
(and any
future datasets with these teams) to create a consistent spelling.
How do I
go about doing this? Specifically, if I have, say three variables
containing
spelling 1, spelling 2, and spelling 3 of a school, and I want to use
spelling 1 in another dataset, how can I merge with a variable that
has some
schools with spellling 1 and others with spelling 2 or 3?
Thanks a lot,
Rufus
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/