Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: converting high frequency data to low frequency
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: converting high frequency data to low frequency
Date
Thu, 04 Nov 2010 20:45:14 -0400
At 06:58 PM 11/4/2010, Dimitry wrote:
I have data on orders that looks like this:
store_id ordertime
1901 10oct2010 15:48:18
1901 10oct2010 15:50:36
1901 10oct2010 16:41:35
1901 10oct2010 16:53:37
1901 10oct2010 18:38:24
1901 10oct2010 19:09:22
1901 10oct2010 21:14:22
.....
1906 10oct2010 17:54:59
1906 12oct2010 19:21:27
1906 17oct2010 16:10:01
1906 17oct2010 19:34:39
1906 17oct2010 21:11:00
1906 17oct2010 22:18:55
1906 17oct2010 22:55:13
1906 19oct2010 22:23:10
1906 20oct2010 01:19:24
Ordertime is in %tc format. I would like to count the number of orders
in 5 minute buckets by store_id. I would like to use
collapse (count) ordertime, by(store_id bucket)
but I have no idea how to group the ordertime into 5 minute intervals
that are nicely labeled. I imagine I could calculate the elapsed time
since midnight for each day, divide that into 5 minute intervals (or
6000 unit intervals), and find some way to label, but I have not
gotten this to work yet. Is there's an easier way? I tried googling
and findit-ing, but I am not sure I know what this type of
transformation is called.
I believe you need to do just what you stated. In particular, if h, m
, and s represent the hours, minutes and seconds, then you want
gen long s2 = 3600 * h + 60 * m + s
then take int(s2/300)
(You may want to s2 to include date*24*3600 -- if you don't use the
date as a separate component of the identifier.)
There may be other methods available that I'm not aware of.
HTH
--David
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/