Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to remove cross-sections with high number of missing values in panel data analysis
From
Eric Booth <[email protected]>
To
[email protected]
Subject
Re: st: How to remove cross-sections with high number of missing values in panel data analysis
Date
Sat, 20 Feb 2010 00:11:48 -0600
>
> Is there any quick way to convert the value of variable in
> consideration to missing in all those cross-sections with insufficient
> data.
Instead of converting them to missing or dropping them, you could just ignore
them with an "if" statement after creating an indicator that flags them if they have
too much missingness:
*------------------------BEGIN EXAMPLE
clear
inp case year v1
1 2000 80
1 2001 350
1 2002 2285
1 2003 2402
1 2004 480
1 2005 2135
1 2006 1862
1 2007 230
1 2008 1302
2 2000 118
2 2001 2427
2 2002 825
2 2003 326
2 2004 1111
3 2000 333
3 2001 853
3 2002 1294
3 2003 1137
3 2004 1011
3 2005 31
3 2006 750
3 2007 408
3 2008 1369
3 2009 198
3 2010 1476
3 2011 1609
3 2012 783
end
fillin case year
drop _f
*****
bys case: sum year
*ignore if there aren't at least 6 cases*
bys case: egen ignore = count(v1)
tab v1 case if ignore>6 & !mi(ignore)
mean v1 if ignore>6 & !mi(ignore)
**"ignore" shouldn't be missing, but just in case
**now, your analysis here**
*------------------------END EXAMPLE
if you really want to covert "var1" to missing when there are less than
a certain number of cases, you could type:
replace var1 = . if ignore<=6
~ Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249
http://ppri.tamu.edu
On Feb 19, 2010, at 11:46 PM, Prabhat wrote:
> Dear members,
>
> I am new to STATA.
>
> While analyzing a panel with 23 crosssections and 30 years, I am
> getting abnormal results in some cases thanks to very few number of
> observations (less than 5) in each cross sections.
>
> Is there any quick way to convert the value of variable in
> consideration to missing in all those cross-sections with insufficient
> data.
>
> In summary,
>
> I have
> 3 observations of y for ID=25
> 2 observations of y for ID=58
>
> and so on
>
> where y can have up to 30 observations for each cross-section i.e. each ID.
>
> I need to set up some rule, which automatically discards one
> cross-section if number of missing values is very high.
>
> Any comment will be appreciated.
>
> Thank you.
>
> Regrads,
> Prabhat
> International University of Japan
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/