Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How can I fill in missing values for the month or quarter in this data set?
From
Aaron Kirkman <[email protected]>
To
[email protected]
Subject
Re: st: How can I fill in missing values for the month or quarter in this data set?
Date
Wed, 14 Nov 2012 14:14:58 -0600
Nick,
Thank you for pointing me to the --fillin-- command. That does exactly
what I need. I plan to interpolate the data for missing expenditures
based on the average percent spent in each quarter in years where data
for all four quarters is present. I already completed that procedure,
so I simply needed the data this format in order to apply it.
Aaron
On Wed, Nov 14, 2012 at 2:04 PM, Nick Cox <[email protected]> wrote:
> -fillin- is the command of choice here.
>
> . fillin state year quarter
>
> That said, the larger issue is why you want to do this. -fillin-
> without any extras will just extend your dataset and add missings on
> -expenditure-, but necessarily no extra information.
>
> As elsewhere I've posted on interpolation, it may seem odd that I ask
> this, but what will you gain thereby?
>
> Nick
>
> On Wed, Nov 14, 2012 at 7:47 PM, Aaron Kirkman <[email protected]> wrote:
>
>> I have a dataset in this format:
>>
>> #####
>> input year str8 state quarter expenditure
>> 1998 Illinois 1 1000
>> 1998 Illinois 3 3000
>> 1998 Illinois 4 4000
>> 1998 Illinois 2 2000
>>
>> 2000 Illinois 1 5000
>> 2000 Illinois 4 11000
>> 2000 Illinois 2 6000
>> 2000 Illinois 3 8000
>>
>> 2001 Illinois 2 1575
>>
>> 2002 Illinois 4 5600
>>
>> 1998 Missouri 3 2100
>> 1998 Missouri 4 2100
>> 1998 Missouri 1 1900
>> 1998 Missouri 2 1900
>>
>> 2000 Missouri 1 8500
>>
>> 2001 Missouri 1 8000
>> 2001 Missouri 2 9800
>> 2001 Missouri 3 7500
>> 2001 Missouri 4 1800
>>
>> 2002 Missouri 3 1400
>> end
>> #####
>>
>> I'd like to fill in the missing quarters for each state/year group, so
>> that each state/year group has four observations that include quarters
>> 1 - 4. As long as the --expenditure-- value is preserved for the
>> quarter in which it occurs in the original dataset, it can be missing
>> for the rest.
>>
>> I attempted a solution using --expand--, but I'm not sure how to
>> proceed from here.
>>
>> #####
>> bysort state year: egen numquarters = count(expenditure)
>> gen quartermissing = numquarters < 4
>> drop numquarters
>> expand 4 if quartermissing, gen(new)
>> sort state year
>> #####
>>
>> This successfully fills in the missing quarters with the value of the
>> only existing quarter, and it's trivial to set these values to missing
>> instead. The variables --quartermissing-- and --new-- indicate if
>> quarters were missing and if new observations were created, but I'm
>> stuck as to how I can proceed from there.
>>
>> If there is a way to sort the observations by group, while ensuring
>> that the observation where quarter was not missing remained in its
>> correct position, setting quarter to the observation number by group
>> would work. Unfortunately, I'm not sure how to go about sorting in
>> this way, since logically, the sorting will be dealing with either
>> missing values or identical quarter numbers, which complicates the
>> situation.
>>
>> Is there a simple way to accomplish this that I'm missing?
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/