Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Data Quality check of unbalanced panel data
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Data Quality check of unbalanced panel data
Date
Mon, 20 Jan 2014 16:45:23 +0000
If you seek a p-value, you need an explicit model of the generating
process. No free lunch here.
If your aim is just checking the quality, nothing beats a graphical
exploration.
Nick
[email protected]
On 20 January 2014 16:23, SIYAM, Amani <[email protected]> wrote:
> Dear Stata-Listers,
>
> I have a panel of 24 years (1990-2013) of a continuous variable Q measured in X countries. For each country, the measurement of Q comes from 1 or more data sources to fill the panel of years and many countries have unbalanced panels - example shown below.
>
> I wish to diagnose the chance of an outlier/ odd value (which could be due to data source variability, or a data entry error) before proceeding with my analysis.
>
> To measure the average change overtime I calculated at each year the average exponential growth rate (AEGR)=ln(Q n - Q n-1) / (t n-tn-1) for all t >1990
>
> I also calculated for each country AEGR_ALL for the total years contributed (e.g. in the example below ln(Q 2010 - Q 1990) / 20 years)
>
> +----------------------------------------+
> | year Q AEGR AEGR_ALL |
> |----------------------------------------|
> 1. | 1990 .539 . .0409264 |
> 2. | 1991 .538 -.001857 .0409264 |
> 3. | 1992 .598 .1057322 .0409264 |
> 4. | 1993 .606 .0132893 .0409264 |
> 5. | 1994 .606 0 .0409264 |
> |----------------------------------------|
> 6. | 1995 .666 .0944097 .0409264 |
> 7. | 1996 .681 .0222726 .0409264 |
> 8. | 1997 .703 .0317946 .0409264 |
> 9. | 1998 .733 .0417888 .0409264 |
> 10. | 1999 .76 .0361727 .0409264 |
> |----------------------------------------|
> 11. | 2000 .782 .0285363 .0409264 |
> 12. | 2001 .807 .0314689 .0409264 |
> 13. | 2002 .819 .0147604 .0409264 |
> 14. | 2003 .833 .0169496 .0409264 |
> 15. | 2004 1.341 .4761372 .0409264 |
> |----------------------------------------|
> 16. | 2005 .933 -.3627656 .0409264 |
> 17. | 2007 1.023 .0460448 .0409264 |
> 18. | 2008 1.16 .1256805 .0409264 |
> 19. | 2009 1.19 .0255334 .0409264 |
> 20. | 2010 1.222 .0265355 .0409264 |
> +----------------------------------------+
>
> I am now stuck on how to find and "best-classify" the oddities....for example I am suspecting an outlier Q-value in the year "2004" (AEGR is 10 times AEGR_ALL).
>
> Is there a way I can test that using the stats calculated (AEGR and AEGR_ALL) or are there better approaches to follow in quality-checking unbalanced panel data.
>
> With all my thanks in advance.
>
> Amani
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/