Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: descripive stats on %tc formatted variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: descripive stats on %tc formatted variables
Date
Thu, 28 Jun 2012 01:28:16 +0100
Format is a matter of how numbers are displayed, but the complaint
about format is undeserved:
1. The format is not biting here, so much as the magnitudes you have,
given the units being used.
2. The relationship between a variable's display format and how
numbers are displayed by -summarize- is at best indirect. In this case
the display format %tc is not being used at all by -summarize-. If it
were then the mean would be displayed otherwise as is shown by
. di %tc 889268.9
01jan1960 00:14:49
which is not what you are seeing (and would be even less help).
#1 is the main point. A clock time is expressed in milliseconds, so
the numbers are right, as for example 3 hours is 1,080,000 ms.
. di 3 * 60 * 60 * 100
1080000
You don't say how you would prefer the numbers to be displayed, but
suppose that minutes are what you want. Then
gen double q27_min = q27 / 60000
Now try -summarize q27_min- to see results in minutes. If you want
hours, you need a different divisor.
Your idea of a different format won't help much or at all here, as #1
and #2 imply. Also, it is best not to think of assigning a different
-format- as converting a variable, as the values stored remain the
same: all you change is how they are displayed, but even that is not
directly relevant in this case.
To sum up: As far as Stata is concerned, you are getting what you
asked for, results in milliseconds. But all you need to do is change
the units. However, that is nothing to do with -format- in Stata's
sense.
Nick
On Thu, Jun 28, 2012 at 12:00 AM, Kerry MacQuarrie
<[email protected]> wrote:
> I am struggling to run the most basic summary statistics on selected
> variables in my dataset because they are formatted as %tc (aka clock) data.
> For example, a certain variable for waiting time to see a provider is in the
> format HH:MM:SS, with a range of 1 minute to 5 hours. The seconds are
> always zero (i.e. always ending in :00) as the times were reported in
> minutes with much heaping at :05, :10, :30, and :00 minutes as one might
> expect in self-reported data.
>
> I simply want to run some summary statistics such as the mean/median, range,
> quintiles, etc. But I’m tripped up by the formatting. A straightforward
> command like sum varname returns this non-intuitive output:
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> q27 | 766 889268.9 1644010 0 1.80e+07
>
> Do I need to convert the variable into a different format? Are there
> commands to produce the types of summary statistics I’m looking for that are
> specific to %tc variables?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/