Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: st: Re: st: Σχετ: st: calculating percentage changes in an unbalanced panel data set
From
Nick Cox <[email protected]>
To
[email protected]
Subject
st: Re: st: Re: st: Σχετ: st: calculating percentage changes in an unbalanced panel data set
Date
Wed, 6 Mar 2013 11:43:14 +0000
A more general point is embedded here which arises again and again
with date variables.
It is vital to realise that descriptions of the form
my dates are of format dd/mm/yyyy
do not distinguish between string variables with values such as
"25/12/2012" and numeric date variables with -format- (Stata's sense)
%tdd/n/Cy
That is,
1. The word "format" is overloaded.
2. The needed information is about _types_.
The output of -describe- for the variable concerned is informative.
Word descriptions using your own terminology are often ambiguous.
On Wed, Mar 6, 2013 at 10:38 AM, Nick Cox <[email protected]> wrote:
> My guess is that -time- is a string variable.
>
> This contradicts the earlier output from -tsset-, which would not have
> worked if -time- were a string variable.
>
> So, we need another guess. Perhaps you are really using different
> names, but translated for Statalist for some reason, but forgot to
> take some difference into account. Or this is a slightly different
> version of the same data. Either way, there is something about your
> dataset which you are not telling us.
>
> Whatever the answer, -mofd()- should work if and only if the argument
> is a numeric daily date variable. If it's a string variable, you
> should use -date()- to convert it to a numeric daily date variable.
>
> Nick
>
> On Wed, Mar 6, 2013 at 9:08 AM, Tzaloupas Dimitrov
> <[email protected]> wrote:
>
>> thanks for your reply Rebecca. the dates that I have in my files are of this format dd/mm/yyyy. so by applying the code you provided, specifically
>>
>> gen month=mofd(time) I get the following error
>>
>> type mismatch
>> r(109);
>>
>>
>> So, still I can not find the answer to my question. Is there any other suggestion?
>
> Rebecca Pope <[email protected]>
>
>> You have inflation measured on a daily basis? My guess is not. In all
>> likelihood, what you have is monthly data that happens to be coded
>> 01mmmYYYY. Stata, however, does not know this.
>>
>> gen month = mofd(time) // get date in month format
>> format month %tm
>> tsset id month
>>
>> Now Stata knows you have monthly changes, so it doesn't appear that
>> you have many missing observations within your panel simply due to
>> false "gaps" because of how your data is recorded.
>>
>> Once you have -tsset- your data, you can use the lag operator.
>> Otherwise, based on what you are doing, there isn't much point in
>> -tsset-. Using lags to calculate a change in the inflation rate would
>> be as so:
>>
>> gen p2 = (inf/L.inf-1)*100 // L. is Stata's lag operator (see -help
>> tsvarlist- if unfamiliar)
>>
>> If you are wanting inflation since "baseline" rather than
>> period-to-period inflation:
>> bys id (mon): gen p2_alt = (inf[_n]/inf[1]-1)*100 // note here that
>> the time variable is in ()
>>
>> In your original code, you had "bys country time". The problem with
>> this is that Stata is looking within country _and_ time and counting
>> observations. Because you only have one observation at each time
>> period, you get missing values. Placing time in parentheses tells
>> Stata to sort by that value but not to count within it.
>>
>> p2 will result in missing values if your panel data are still
>> unbalanced after correcting for monthly observations. p2_alt will give
>> you a value at every point in your series. However, the two provide
>> fundamentally different information. Your example of 2/1 leaves the
>> ultimate question unclear so I've given you code for both.
>
> On Tue, Mar 5, 2013 at 5:27 PM, Tzaloupas Dimitrov
>
>>> I have some time series observations (inflation) for a set of countries
>>>
>>> The panel data set is unbalanced, that is,
>>>
>>> egen id = group(country), label
>>> tsset id time
>>> panel variable: id (unbalanced)
>>> time variable: time, 01oct2008 to 01nov2011, but with gaps
>>> delta: 1 day
>>>
>>> within each country I want to find the percentage change of inflation.
>>>
>>> I tried
>>>
>>> bysort country time : gen p2=(inf[2]-inf[1]/inf[1])*100
>>>
>>> but I get this message
>>> (500 missing values generated)
>>>
>>> Am I doing something wrong?
>>>
>>>
>>>
>>> I use Stata 11
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/