Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: converting high frequency data to low frequency
From
David Kantor <[email protected]>
To
[email protected]
Subject
RE: st: converting high frequency data to low frequency
Date
Fri, 05 Nov 2010 10:25:37 -0400
Thank you to Nick for the correction and for bringing me up-to-date.
--David
At 07:59 AM 11/5/2010, you wrote:
David's suggestion strikes me as right in principle, but I think
he's still thinking in terms of the bad old days before Stata 10
when people had to work out their own awkward ways of handling
times of day. That's a misunderstanding here.
As always, the _format_ of these data is a matter of how they are to
be displayed, and not a matter of how they are stored. (An article
on the most common misunderstandings of Stata would surely include this one.)
Dimitry's data look exactly like standard Stata date-times, allowed
in Stata 10 up, meaning that underneath the cosmetic format they are
times in milliseconds (ms). Therefore, he wants to round in units of
1000 * 60 * 5 = 300000.
Here is a concrete example which covers everything needed to
understand this problem.
Using a %tc format for a -clock()- conversion of 11:31:00 today
gives us back, not surprisingly, the same information:
. di %tc clock("5 Nov 2010 11:31:00", "DMYhms")
05nov2010 11:31:00
But underneath all that, the precise date-time _really_ is just an
integer with units ms.
. di %20.0f clock("5 Nov 2010 11:31:00", "DMYhms")
1604575860000
(The "20" in the format is much more than I need but causes no problem here.)
You can round down or round up; which way you go is a matter of
taste or convention. I almost never round using -int()-. I almost
always round using -floor()- or -ceil()- because then I know
immediately that I am rounding down (-floor()-) or up
(-ceil()-; think ceiling) and I don't get bit around 0 because the
way -int()- works with negative numbers is not what I usually want,
except that I might forget that or not foresee it might happen with my data.
Now rounding down, for example, in units of 5 minutes is rounding
down in units of 300000 ms. There are three steps, except that they
can be combined in one line:
1. Divide by 300000.
2. Round down to the next integer below.
3. Multiply by 300000.
So, the result is another large integer,
. di %20.0f 300000 * floor(clock("5 Nov 2010 11:31:00", "DMYhms")/300000)
1604575800000
But we should check that we did it right:
. di %tc 300000 * floor(clock("5 Nov 2010 11:31:00", "DMYhms")/300000)
05nov2010 11:30:00
With a variable it's going to be
gen double binnedtime = 300000 * floor(ordertime/300000)
format binnedtime %tc
Never forget the -double-. Then you can -collapse- (or better
-contract-) in terms of the new variable. (If it's really just time
of day you care about, you must get there first by subtraction.)
(I suggested generalising -floor()- and -ceil()- some years ago to
StataCorp so that with two arguments -floor(ordertime, 300000), say,
would do what is above, but the suggestion is still lurking in their
files. A good argument against would be that the long-winded way to
do it, as above, is easy enough.)
See also if desired
SJ-3-4 dm0002 . . . . . . . . Stata tip 2: Building with floors and ceilings
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. N. J. Cox
Q4/03 SJ
3(4):446--447 (no commands)
tips for using floor() and ceil()
Nick
[email protected]
[...]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/