Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Restructuring the time dimension in a dataset
From
Tunga Kantarcı <[email protected]>
To
[email protected]
Subject
st: Restructuring the time dimension in a dataset
Date
Fri, 11 Oct 2013 20:47:31 +0200
Hello,
I have a dataset where ‘variable one’ indicates a unique
identification number for each individual in the data. Then there is
‘variable two’ which indicates a date (like 01-01-2010) which is the
start date of a period and ‘variable three’ indicates a date (like
05-01-2010) which is the end date of the same period. Then there is
‘variable four’ which indicates a number between 0 and 1 (like 0.574)
that has been realised during the period 01-01-2010 - 05-01-2010.
A snapshot of the data sheet for individual 4115111 looks like this:
4115111 01-01-2010 05-01-2010 0.574
4115111 05-01-2010 31-09-2011 0.321
In this dataset, as the snapshot also shows, the length of a period is
irregular. It can be as short as a day (like 01-01-2010 – 02-01-2010)
or as long as a year (like 01-01-2010 - 01-01-2011), or even longer.
Hence it is not clear how I should treat the time dimension of the
data. The cases of variable four are not observed on a monthly or
yearly basis. I plan to restructure the data. That is, I plan to
fragment each period into multiple periods with a length of one day
and then aggregate them to, say, a month. This means that the first
period, which is
4115111 01-01-2010 05-01-2010 0.574,
would be fragmented into
4115111 01-01-2010 02-01-2010 0.574
4115111 02-01-2010 03-01-2010 0.574
4115111 03-01-2010 04-01-2010 0.574
4115111 04-01-2010 05-01-2010 0.574,
and the second period, which is
4115111 05-01-2010 31-09-2011 0.321,
would be fragmented into
4115111 05-01-2010 06-01-2010 0.321
.
.
4115111 30-09-2011 31-09-2011 0.321.
After this fragmentation, I plan to collapse the daily series to
monthly series which would mean that variable four will be averaged
over the days of a month to make up a monthly number, perhaps using
the “collapse variable four, by(variable two)” command. In the end I
would like to have monthly data.
Given this explanation, I would like to ask two questions.
Question one: In Stata, how can I fragment each case (that is each row
in the data) into multiple cases (multiple rows) with respect to
variable two and variable three as explained above?
Question two: If it was your own data, how would you treat it? Would
your approach be the same as mine?
Tunga
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/