Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Laurie Molina <molinalaurie@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Controlling for atrittion in panel, or creating a panel without atrittion |
Date | Wed, 22 Feb 2012 19:26:40 -0600 |
Dear all, I have a database of 24 million observation, for 10 periods of time. For each observation i have an id unique over time that allows me to merge the 24 million observations in each of the 10 periods. However i want to take a sample of the population because this very large database is not easy to work with. I seem to have two options: 1. To take a random sample at the first period and then merge the databases to have a panel. In this case my panel will have attrition. In fact by the last period the attrition rate (comparing period 1 observations with period 10 observations will be of almost 50%). The remaining sample size much smaller than the original sample size is leading to representativeness issues. 2. To take a random sample of the observations that appear on the database in all the periods. If attrition is not random, then i would have a population that is different to the original population, and hence my random sample may not be representative for the original population, but only for the population defined by all the observations that appear in the database in all periods. Which option do you think i should take? Thank you very much in advance as always! * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/