Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: correcting data inconsistencies
From
[email protected]
To
"[email protected]" <[email protected]>
Subject
Re: st: Re: correcting data inconsistencies
Date
Mon, 11 Mar 2013 14:45:36 +0000
Good advice.
David: A quite separate detail in this example is presumption that we all understand what the codes mean. I was guessing at some system in which increasing integers imply annual progress, but it's best not to presume on an international list that every country has the same codes as yours.
Nick
[email protected]
On 11 Mar 2013, at 14:35, Rebecca Pope <[email protected]> wrote:
> Nick has given you a solution to identifying your problem panels. I
> think this is a good first-step in the process so you have an idea of
> how wide-spread your problem is. There isn't a simple solution to
> "correcting" the values though because you don't know what is correct.
> You really have to make a judgement call and then document your
> decision rules. For example, you could decide on the modal value:
>
> bys personid: egen myeduc = mode(educ), minmode
>
> Alternately, whether the individual reported 11, 12, or 13, they
> definitely had 11 (erring on the lower side):
>
> bys id: egen myeduc = min(educ)
>
> You could also "carry forward" the higher values. I.e. once a person
> has attained 12 years, (s)he has 12 years until acquiring the next
> year of education. One has to wonder how valuable years 12 & 13 were
> if they are forgotten so quickly though. :-)
>
> bys personid educ (year): gen change = (_n==1)
> bys personid (year): replace change= 0 if educ < educ[_n-1]
> bys personid (year): gen myedu = sum(cond(_n==1,educ,change))
>
> These are just a few of the rules I can think of off the top of my
> head. I'd certainly check to see if there is a common approach in the
> educational research literature (or wherever you intend to publish) if
> for no other reason than that you're less likely to get slammed by a
> reviewer.
>
> Regards,
> Rebecca
>
> On Mon, Mar 11, 2013 at 8:43 AM, Nick Cox <[email protected]> wrote:
>> The simple program is called Stata....
>>
>> However, you have to tell it what you regard as inconsistent.
>>
>> In the case, you could flag any observation that doesn't have a higher
>> -education- value than the previous observation in the same panel.
>>
>> bysort personid (year) : gen flag1 = educ[_n+1] <= educ
>> by personid : gen flag2 = educ <= educ[_n-1]
>>
>> list if flag1 | flag2
>>
>> You could also flag panels, like that
>>
>> gen problem = 0
>> bysort personid (year) : replace problem = sum(educ <= educ[_n-1]) if _n > 1
>> by personid : replace problem = problem[_N]
>>
>> edit if problem
>>
>> Fluency with -by:- gets you a long way.
>>
>> SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
>> Q1/02 SJ 2(1):86--102 (no commands)
>> explains the use of the by varlist : construct to tackle
>> a variety of problems with group structure, ranging from
>> simple calculations for each of several groups to more
>> advanced manipulations that use the built-in _n and _N
>>
>> http://www.stata-journal.com/article.html?article=pr0004 leads to a .pdf.
>>
>> Nick
>>
>> On Mon, Mar 11, 2013 at 1:31 PM, David Jose <[email protected]> wrote:
>>
>>> I would like to correct self-reported data inconsistencies in a panel
>>> data set. For example, if there is an education variable, which is
>>> reported 5 times, say as follows:
>>>
>>> year educ
>>>
>>> 2000 12
>>>
>>> 2002 11
>>>
>>> 2004 13
>>>
>>> 2006 12
>>>
>>> 2008 11
>>>
>>> I wonder if anyone has a simple program that can be implemented to
>>> correct such inconsistencies. Thanks in advance.
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/