Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: correcting data inconsistencies
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
Re: st: Re: correcting data inconsistencies
Date
Mon, 11 Mar 2013 09:35:47 -0500
Nick has given you a solution to identifying your problem panels. I
think this is a good first-step in the process so you have an idea of
how wide-spread your problem is. There isn't a simple solution to
"correcting" the values though because you don't know what is correct.
You really have to make a judgement call and then document your
decision rules. For example, you could decide on the modal value:
bys personid: egen myeduc = mode(educ), minmode
Alternately, whether the individual reported 11, 12, or 13, they
definitely had 11 (erring on the lower side):
bys id: egen myeduc = min(educ)
You could also "carry forward" the higher values. I.e. once a person
has attained 12 years, (s)he has 12 years until acquiring the next
year of education. One has to wonder how valuable years 12 & 13 were
if they are forgotten so quickly though. :-)
bys personid educ (year): gen change = (_n==1)
bys personid (year): replace change= 0 if educ < educ[_n-1]
bys personid (year): gen myedu = sum(cond(_n==1,educ,change))
These are just a few of the rules I can think of off the top of my
head. I'd certainly check to see if there is a common approach in the
educational research literature (or wherever you intend to publish) if
for no other reason than that you're less likely to get slammed by a
reviewer.
Regards,
Rebecca
On Mon, Mar 11, 2013 at 8:43 AM, Nick Cox <[email protected]> wrote:
> The simple program is called Stata....
>
> However, you have to tell it what you regard as inconsistent.
>
> In the case, you could flag any observation that doesn't have a higher
> -education- value than the previous observation in the same panel.
>
> bysort personid (year) : gen flag1 = educ[_n+1] <= educ
> by personid : gen flag2 = educ <= educ[_n-1]
>
> list if flag1 | flag2
>
> You could also flag panels, like that
>
> gen problem = 0
> bysort personid (year) : replace problem = sum(educ <= educ[_n-1]) if _n > 1
> by personid : replace problem = problem[_N]
>
> edit if problem
>
> Fluency with -by:- gets you a long way.
>
> SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q1/02 SJ 2(1):86--102 (no commands)
> explains the use of the by varlist : construct to tackle
> a variety of problems with group structure, ranging from
> simple calculations for each of several groups to more
> advanced manipulations that use the built-in _n and _N
>
> http://www.stata-journal.com/article.html?article=pr0004 leads to a .pdf.
>
> Nick
>
> On Mon, Mar 11, 2013 at 1:31 PM, David Jose <[email protected]> wrote:
>
>> I would like to correct self-reported data inconsistencies in a panel
>> data set. For example, if there is an education variable, which is
>> reported 5 times, say as follows:
>>
>> year educ
>>
>> 2000 12
>>
>> 2002 11
>>
>> 2004 13
>>
>> 2006 12
>>
>> 2008 11
>>
>> I wonder if anyone has a simple program that can be implemented to
>> correct such inconsistencies. Thanks in advance.
>>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/