Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: descripive stats on %tc formatted variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: descripive stats on %tc formatted variables
Date
Thu, 28 Jun 2012 01:40:07 +0100
Also, your sample results imply a very skewed distribution, so mean,
SD, min and max need to be supplemented with more summary statistics.
On Thu, Jun 28, 2012 at 1:28 AM, Nick Cox <[email protected]> wrote:
> Format is a matter of how numbers are displayed, but the complaint
> about format is undeserved:
>
> 1. The format is not biting here, so much as the magnitudes you have,
> given the units being used.
>
> 2. The relationship between a variable's display format and how
> numbers are displayed by -summarize- is at best indirect. In this case
> the display format %tc is not being used at all by -summarize-. If it
> were then the mean would be displayed otherwise as is shown by
>
> . di %tc 889268.9
> 01jan1960 00:14:49
>
> which is not what you are seeing (and would be even less help).
>
> #1 is the main point. A clock time is expressed in milliseconds, so
> the numbers are right, as for example 3 hours is 1,080,000 ms.
>
> . di 3 * 60 * 60 * 100
> 1080000
>
> You don't say how you would prefer the numbers to be displayed, but
> suppose that minutes are what you want. Then
>
> gen double q27_min = q27 / 60000
>
> Now try -summarize q27_min- to see results in minutes. If you want
> hours, you need a different divisor.
>
> Your idea of a different format won't help much or at all here, as #1
> and #2 imply. Also, it is best not to think of assigning a different
> -format- as converting a variable, as the values stored remain the
> same: all you change is how they are displayed, but even that is not
> directly relevant in this case.
>
> To sum up: As far as Stata is concerned, you are getting what you
> asked for, results in milliseconds. But all you need to do is change
> the units. However, that is nothing to do with -format- in Stata's
> sense.
>
> Nick
>
> On Thu, Jun 28, 2012 at 12:00 AM, Kerry MacQuarrie
> <[email protected]> wrote:
>
>> I am struggling to run the most basic summary statistics on selected
>> variables in my dataset because they are formatted as %tc (aka clock) data.
>> For example, a certain variable for waiting time to see a provider is in the
>> format HH:MM:SS, with a range of 1 minute to 5 hours. The seconds are
>> always zero (i.e. always ending in :00) as the times were reported in
>> minutes with much heaping at :05, :10, :30, and :00 minutes as one might
>> expect in self-reported data.
>>
>> I simply want to run some summary statistics such as the mean/median, range,
>> quintiles, etc. But I’m tripped up by the formatting. A straightforward
>> command like sum varname returns this non-intuitive output:
>>
>> Variable | Obs Mean Std. Dev. Min Max
>> -------------+--------------------------------------------------------
>> q27 | 766 889268.9 1644010 0 1.80e+07
>>
>> Do I need to convert the variable into a different format? Are there
>> commands to produce the types of summary statistics I’m looking for that are
>> specific to %tc variables?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/