Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: cleaning data efficiently
From
Maria Ana Vitorino <[email protected]>
To
<[email protected]>
Subject
Re: st: RE: cleaning data efficiently
Date
Fri, 28 Oct 2011 12:02:58 -0400
Thanks for the quick response.
This is very helpful.
Ana
On Oct 28, 2011, at 11:24 AM, Nick Cox wrote:
1. -egen- has a -mode()- function.
egen mode = mode(regionname), by(regioncode)
2.
For that you need something like
egen tag = tag(mode regioncode)
egen ndistinctvalues = total(tag), by(mode)
See also for a review
SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct
observations
(help distinct if installed) . . . . . . N. J. Cox and G.
M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command
3.
ssc inst groups
help groups
groups regionname regioncode
(there are other ways, but I like this one)
Nick
[email protected]
Vitorino, Maria Ana
Suppose I have the following data:
regioncode regionname
X AAA
Y BBB
Z CCC
X .
X AAA
Y BBB
Z .
Z AAA
Z CCC
Z CCC
Assume also that the regioncode variable is correct but there are
some errors and missing values in the regionname variable.
1) Is there an efficient way to fix the entries in the regionname
variable? (For this we need to assume that the correspondence
between regioncode and regioname that occurs more frequently is the
correct one.)
I usually deal with this type of issues using several lines of code
so I'm wondering if there is a more efficient way making use of some
stata commands that I'm not familiar with.
Also, if, after correcting the mistakes, I want to
2)check if the correspondence between the two variables is unique
3) create a table with regionname regioncode and frequency of
observations (but not a two-way table)
What is the most efficient way?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/