Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Unbalancing a panel data set with country pairs
From
Gordon Hughes <[email protected]>
To
[email protected]
Subject
Re: st: Unbalancing a panel data set with country pairs
Date
Sat, 22 Jan 2011 10:18:31 +0000
The nature of your data isn't entirely clear, so we need to be clear
about this.
Your description seems to correspond to the following situation: we
have a series of N x N matrices of migration flows from country i to
country j so that m[i,j,t] is the total flow from i to j in year
t. You have converted the matrices from wide format to long format
so each observation contains i, j, t and m[i,j,t] plus a set of
independent variables - x[i,t], y[j,t], z[i,j,t] - describing (for
example) economic conditions in both origin and destination countries
or the distance between them. Then you want to delete panels for i,j
pairs for which either (a) m[i,j,t] is missing for some t or (b) some
of the x[], y[], z[] variables are missing.
If this is correct, then there are lots of ways of doing this and
some will be more efficient than others. However, clumsy but obvious
may be the best choice so you are clear what you are doing. If you
haven't already done so, create a numeric country pair identifier
cpair=(i-1)*N+j and then use:
sort cpair year
egen n_miss_var=rowmiss(<varlist>)
by cpair: egen max_miss_var=max(n_miss_var)
by cpair: egen n_obs=count(year)
drop if max_miss_var > 0 | n_obs < tmax
For each observation - i.e. country pair and time period - step 1
counts the number of missing values in <varlist> which would include
m[] and whichever of the independent variables you are interested
in. Step 2 identifies whether any of the observations in a
particular country pair panel contains missing data. Note that egen
adds the value of max_miss_var to all observations in the
panel. Step 3 counts the number of observations for each country
pair. Finally step 4 deletes all observations for any panel that
contains any observations with missing data plus any country pair
with a panel of less than tmax (=17) observations. This seems to be
what you want to do. As Nick Cox explained, the key is to use a
combination of -by ..- with -egen-.
However, I would question whether this is the right way to
proceed. It will generate a strongly balanced panel but for many
datasets it will throw away a large amount of data. You might be
better off looking for methods of analysis that do not require
strongly balanced panels.
Gordon Hughes
[email protected]
From: Matei Frunzetti <[email protected]>
Subject: st: Unbalancing a panel data set with country pairs
Dear Statalisters,
I am fairly new to Stata and therefore might have to bother you with
somewhat trivial questions in the near future. Please excuse.
Cutting to the chase:
I 'm working on a panel data set over 17 years. It's fairly unbalanced
und i need to drop all observations for country pairs that either lack
full length (as in years) or have missings in one of the independant
variables. The problem is that i have to delete all observations of
these country pairs for all years if only one or more variables have a
missing or if it is one or more years short. I ran into a dead end
trying to figure out how to imply the rest of the observations of a
"faulty" country pair into the drop command.
I hope that was enough information. Help would be much appreciated
Regards
Matei
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/