Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: rectangulizing data
From
Dmitriy Krichevskiy <[email protected]>
To
[email protected]
Subject
Re: st: rectangulizing data
Date
Thu, 26 May 2011 13:44:57 -0400
Thanks for the links Austin,
I am lucky to find an expert on both the matter and the data. I am
basically finding similar results as your paper though my interest is
in self-employment vs. wage work. I am still puzzled by the enormous
volatility an average individual faces. Out of the small sample of
people I personally know no one is subjected to such fluctuations,
most certainly no one working for a wage. This makes me uncomfortable
because either I and the people I know are not a good representative
of an average individual or (more worrisome scenario) those
participating in SIPP are some strange individuals self-selecting to
participate.
Back to the issue at hand: would you abandon attempts to calculate
annual income, impute or drop people missing several months?
By abandoning I presume switching to 4 month cumulative (or average) intervals.
On 5/26/11, Austin Nichols <[email protected]> wrote:
> Dmitriy Krichevskiy <[email protected]>:
> Nor is dropping cases harmless; there is some discussion at
> http://www.urban.org/publications/411971.html
> and slides 12-14 of
> http://www-personal.umich.edu/~nicholsa/an_dds.pdf
>
> On Thu, May 26, 2011 at 12:52 PM, Dmitriy Krichevskiy
> <[email protected]> wrote:
>> Thank you for you responses; I apologize for the confusion(s),
>>
>> Clarification then,
>>
>> The data comes from Survey of Income and Program Participation (SIPP)
>> and my particular dataset combines 7 years of data. The data is
>> collected quarterly and recorded monthly (via phone interviews). Hence
>> time=14 is the second month of the second year. Many people in this
>> sample miss interviews often, also income exhibits a lot of volatility
>> (I still do not know why). My goal is to analyze income transitions
>> from quintile to quintile (via -xttrans-) and for annual income I need
>> to aggregate monthly income while differentiating between zero income
>> from missing income. Hence, I am trying to drop people who only have
>> few month of income on record for those years where their information
>> is incomplete while keeping the same people for other years in which
>> they have all the income information recorded. Given very large
>> volatility and a lot of missing interviews I am not sure imputing
>> income is harmless.
>>
>> On 5/26/11, Nick Cox <[email protected]> wrote:
>>> I think this might need to be
>>>
>>> bysort ID year: egen obs = count(month)
>>>
>>> -- perhaps after some work --
>>>
>>> but as is agreed the example is unclear.
>>>
>>> On 26 May 2011, at 16:52, Oliver Jones <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>> your example data structure is a bit confusing since you have month
>>>> greater than 12... I'll assume you have at most 12 Month per person
>>>> per year.
>>>>
>>>> Maybe this can help to drop people how have less than 12 observations
>>>> for one particular year. Let's assume this year is 2006.
>>>>
>>>> bysort ID: egen obs = count(Month)
>>>> drop if year == 2006 & obs < 12
>>>>
>>>> Dose it work?
>>>>
>>>> Best
>>>> Oliver
>>>>
>>>> Am 26.05.2011 17:19, schrieb Dmitriy Krichevskiy:
>>>>> Dear Listers,
>>>>> I am trying to figure out the simplest way to covert a large panel
>>>>> dataset from monthly to annual income. The income is only reported
>>>>> monthly and I would want to clean the data of anyone missing a month
>>>>> in a particular year. I would like to drop observations for that
>>>>> person-year only and keep that person if they are fully present in
>>>>> some other year. Here is an equivalent data structure. As always,
>>>>> that
>>>>> a lot for your help.
>>>>> Dmitriy
>>>>>
>>>>> ID Month Income
>>>>> 1 1 1000
>>>>> 1 2 500
>>>>> 1 3 1000
>>>>> 1 13 0
>>>>> 1 14 0
>>>>> 1 15 0
>>>>> 1 16 0
>>>>> 1 17 600
>>>>> 1 18 1000
>>>>> 1 19 1000
>>>>> 1 20 1000
>>>>> 1 21 1000
>>>>> 1 22 1000
>>>>> 1 23 660
>>>>> 1 24 800
>>>>> 1 25 1200
>>>>> 2 1 2400
>>>>> 2 2 2400
>>>>> 2 5 2600
>>>>> *
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Dmitriy Krichevskiy Ph.D. Candidate
Economics Department
Florida International University
www.fiu.edu/~dkrichev
Research Associate, College of Education
Lumina Foundation Project
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/