|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: data management issue (names listed differently)
Rufus,
I faced the problem recently with hospital data; we only had hospital
names, and needed to identify the hospitals uniquely for the analysis.
I also had the same problem once before with drug info. Here's what I
did both times, translated to your situation (if your variable with
teamnames is -teamname-); it may work for you.
1. create a list of all the text names
. u mydatafile, clear
. bys teamname: keep if _n==1
. outsheet teamname using teamnames.csv, replace
2. read this file teamnames.csv into Excel and add a
second column -code-. in the second column, give the
same number to every row that represents the same team.
this is slightly time consuming, but if they are sorted
on -teamname- it will just take a few minutes. one
advantage here is you can always recheck your codings.
3. export this from excel as codes.csv; read it into Stata
and sort on teamname
. insheet codes.csv, clear
. compress
. sort teamname
. save codes, replace
4. Now you have a file (codes.dta) which has a list of the team names
in all their variety, and you can merge this into your original
file to assign a common code to every team that is the same
. u mydatafile, clear
. sort teamname
. merge teamname using codes
5. Now you can identify the teams by their codes; if you want you can
assign a common name
. bys code: replace teamname=teamname[_n-1] if _n>1
hope this helps,
Jeph
Rufus Peabody wrote:
Hey all,
I'm working with a dataset that contains a few variable containing the
name of different college football teams. The problem is, they are not
spelled consistently (i.e. Miami(FL) and Miami Florida; USC and Southern
Cal). In many cases the spelling differs only in that there is an extra
space after the school name for some. What I'd like to do (and I'm
pretty sure is possible) is create a master file with all the school
names and possible spellings, which I can then somehow merge with my
original dataset (and any future datasets with these teams) to create a
consistent spelling. How do I go about doing this? Specifically, if I
have, say three variables containing spelling 1, spelling 2, and
spelling 3 of a school, and I want to use spelling 1 in another dataset,
how can I merge with a variable that has some schools with spellling 1
and others with spelling 2 or 3?
Thanks a lot,
Rufus
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/