Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | David Kantor <kantor.d@att.net> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: converting high frequency data to low frequency |
Date | Thu, 04 Nov 2010 20:45:14 -0400 |
At 06:58 PM 11/4/2010, Dimitry wrote:
I have data on orders that looks like this: store_id ordertime 1901 10oct2010 15:48:18 1901 10oct2010 15:50:36 1901 10oct2010 16:41:35 1901 10oct2010 16:53:37 1901 10oct2010 18:38:24 1901 10oct2010 19:09:22 1901 10oct2010 21:14:22 ..... 1906 10oct2010 17:54:59 1906 12oct2010 19:21:27 1906 17oct2010 16:10:01 1906 17oct2010 19:34:39 1906 17oct2010 21:11:00 1906 17oct2010 22:18:55 1906 17oct2010 22:55:13 1906 19oct2010 22:23:10 1906 20oct2010 01:19:24 Ordertime is in %tc format. I would like to count the number of orders in 5 minute buckets by store_id. I would like to use collapse (count) ordertime, by(store_id bucket) but I have no idea how to group the ordertime into 5 minute intervals that are nicely labeled. I imagine I could calculate the elapsed time since midnight for each day, divide that into 5 minute intervals (or 6000 unit intervals), and find some way to label, but I have not gotten this to work yet. Is there's an easier way? I tried googling and findit-ing, but I am not sure I know what this type of transformation is called.
I believe you need to do just what you stated. In particular, if h, m , and s represent the hours, minutes and seconds, then you want
gen long s2 = 3600 * h + 60 * m + s then take int(s2/300)(You may want to s2 to include date*24*3600 -- if you don't use the date as a separate component of the identifier.)
There may be other methods available that I'm not aware of. HTH --David * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/