Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: rectangulizing data

From	Dmitriy Krichevskiy <[email protected]>
To	[email protected]
Subject	Re: st: rectangulizing data
Date	Thu, 26 May 2011 13:44:57 -0400

Thanks for the links Austin,
I am lucky to find an expert on both the matter and the data. I am
basically finding similar results as your paper though my interest is
in self-employment vs. wage work. I am still puzzled by the enormous
volatility an average individual faces. Out of the small sample of
people I personally know no one is subjected to such fluctuations,
most certainly no one working for a wage. This makes me uncomfortable
because either I  and the people I know are not a good representative
of an average individual or (more worrisome scenario) those
participating in SIPP are some strange individuals self-selecting to
participate.

Back to the issue at hand: would you abandon attempts to calculate
annual income, impute or drop people missing several months?
By abandoning I presume switching to 4 month cumulative (or average) intervals.

On 5/26/11, Austin Nichols <[email protected]> wrote:
> Dmitriy Krichevskiy <[email protected]>:
> Nor is dropping cases harmless; there is some discussion at
> http://www.urban.org/publications/411971.html
> and slides 12-14 of
> http://www-personal.umich.edu/~nicholsa/an_dds.pdf
>
> On Thu, May 26, 2011 at 12:52 PM, Dmitriy Krichevskiy
> <[email protected]> wrote:
>> Thank you for you responses; I apologize for the confusion(s),
>>
>> Clarification then,
>>
>> The data comes from Survey of Income and Program Participation (SIPP)
>> and my particular dataset combines 7 years of data. The data is
>> collected quarterly and recorded monthly (via phone interviews). Hence
>> time=14 is the second month of the second year. Many people in this
>> sample miss interviews often, also income exhibits a lot of volatility
>> (I still do not know why). My goal is to analyze income transitions
>> from quintile to quintile (via -xttrans-) and for annual income I need
>> to aggregate monthly income while differentiating between zero income
>> from missing income. Hence, I am trying to drop people who only have
>> few month of income on record for those years where their information
>> is incomplete while keeping the same people for other years in which
>> they have all the income information recorded. Given very large
>> volatility and a lot of missing interviews I am not sure imputing
>> income is harmless.
>>
>> On 5/26/11, Nick Cox <[email protected]> wrote:
>>> I think this might need to be
>>>
>>> bysort ID year: egen obs = count(month)
>>>
>>> -- perhaps after some work --
>>>
>>> but as is agreed the example is unclear.
>>>
>>> On 26 May 2011, at 16:52, Oliver Jones <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>> your example data structure is a bit confusing since you have month
>>>> greater than 12... I'll assume you have at most 12 Month per person
>>>> per year.
>>>>
>>>> Maybe this can help to drop people how have less than 12 observations
>>>> for one particular year. Let's assume this year is 2006.
>>>>
>>>> bysort ID: egen obs = count(Month)
>>>> drop if year == 2006 & obs < 12
>>>>
>>>> Dose it work?
>>>>
>>>> Best
>>>> Oliver
>>>>
>>>> Am 26.05.2011 17:19, schrieb Dmitriy Krichevskiy:
>>>>> Dear Listers,
>>>>> I am trying to figure out the simplest way to covert a large panel
>>>>> dataset from monthly to annual income. The income is only reported
>>>>> monthly and I would want to clean the data of anyone missing a month
>>>>> in a particular year. I would like to drop observations for that
>>>>> person-year only and keep that person if they are fully present in
>>>>> some other year. Here is an equivalent data structure. As always,
>>>>> that
>>>>> a lot for your help.
>>>>> Dmitriy
>>>>>
>>>>> ID     Month   Income
>>>>> 1       1          1000
>>>>> 1       2           500
>>>>> 1       3          1000
>>>>> 1       13         0
>>>>> 1       14         0
>>>>> 1       15         0
>>>>> 1       16         0
>>>>> 1       17         600
>>>>> 1       18        1000
>>>>> 1       19        1000
>>>>> 1       20        1000
>>>>> 1       21        1000
>>>>> 1       22        1000
>>>>> 1       23        660
>>>>> 1       24        800
>>>>> 1       25        1200
>>>>> 2        1         2400
>>>>> 2        2         2400
>>>>> 2        5         2600
>>>>> *
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>


-- 
Dmitriy Krichevskiy Ph.D. Candidate
Economics Department
Florida International University
www.fiu.edu/~dkrichev

Research Associate, College of Education
Lumina Foundation Project

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: rectangulizing data
  - From: Maarten Buis <[email protected]>
- Re: st: rectangulizing data
  - From: Austin Nichols <[email protected]>

References:
- st: rectangulizing data
  - From: Dmitriy Krichevskiy <[email protected]>
- Re: st: rectangulizing data
  - From: Oliver Jones <[email protected]>
- Re: st: rectangulizing data
  - From: Nick Cox <[email protected]>
- Re: st: rectangulizing data
  - From: Dmitriy Krichevskiy <[email protected]>
- Re: st: rectangulizing data
  - From: Austin Nichols <[email protected]>

Prev by Date: Re: st: rectangulizing data
Next by Date: Re: st: rectangulizing data
Previous by thread: Re: st: rectangulizing data
Next by thread: Re: st: rectangulizing data
Index(es):
- Date
- Thread