Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Dates, strings, formats & labels: tab / tabstat / table
From
"Allan Reese (Cefas)" <[email protected]>
To
<[email protected]>
Subject
st: Dates, strings, formats & labels: tab / tabstat / table
Date
Mon, 13 Sep 2010 12:30:18 +0100
This is the type of behaviour that puts people off computers and causes
them to think folk who promote Stata are nerds - maybe I am for thinking
it worth writing up ;-)
The objective: summarize a data table as monthly statistics - in time
order. Usually the data comes as a spreadsheet with the date as a
string. I always recommend writing the month as a name to avoid
ambiguity of day and month.
Today's detonator: having converted the "month year" string to a date,
the following commands gave irritatingly inconsistent interpretations.
* Here's the input string variable, which sorts alphabetically
. tab month
month | Freq. Percent Cum.
------------+-----------------------------------
APR 10 | 38 6.92 6.92
AUG 09 | 37 6.74 13.66
DEC 09 | 38 6.92 20.58
FEB 10 | 38 6.92 27.50
JAN 10 | 37 6.74 34.24
JUL 09 | 38 6.92 41.17
JUL 10 | 37 6.74 47.91
JUN 09 | 37 6.74 54.64
JUN 10 | 34 6.19 60.84
MAR 10 | 39 7.10 67.94
MAY 09 | 32 5.83 73.77
MAY 10 | 38 6.92 80.69
NOV 09 | 30 5.46 86.16
OCT 09 | 37 6.74 92.90
SEP 09 | 39 7.10 100.00
------------+-----------------------------------
Total | 549 100.00
* -gen- a date and format it %tdMY
. desc date
storage display value
variable name type format label variable label
------------------------------------------------------------------------
-----------------------------------------------
date float %tdMY
* -tab- insists it was the 1st of each month
. tab date
date | Freq. Percent Cum.
------------+-----------------------------------
01may2009 | 32 5.83 5.83
01jun2009 | 37 6.74 12.57
01jul2009 | 38 6.92 19.49
01aug2009 | 37 6.74 26.23
01sep2009 | 39 7.10 33.33
01oct2009 | 37 6.74 40.07
01nov2009 | 30 5.46 45.54
01dec2009 | 38 6.92 52.46
01jan2010 | 37 6.74 59.20
01feb2010 | 38 6.92 66.12
01mar2010 | 39 7.10 73.22
01apr2010 | 38 6.92 80.15
01may2010 | 38 6.92 87.07
01jun2010 | 34 6.19 93.26
01jul2010 | 37 6.74 100.00
------------+-----------------------------------
Total | 549 100.00
* though -table- can handle the concept ...
. table date
------------------------
date | Freq.
------------+-----------
May09 | 32
June09 | 37
July09 | 38
August09 | 37
September09 | 39
October09 | 37
November09 | 30
December09 | 38
January10 | 37
February10 | 38
March10 | 39
April10 | 38
May10 | 38
June10 | 34
July10 | 37
------------------------
* but -table- will handle only five statistics at a time, and I wanted
six. So tried -tabstat- and got a shock
. tabstat row row , stat(mean) by(date)
Summary statistics: mean
by categories of: date
date | row row
----------+--------------------
18018 | 16.5 16.5
18049 | 54 54
18079 | 94.5 94.5
18110 | 135 135
18141 | 176 176
18171 | 217 217
18202 | 253.5 253.5
18232 | 290.5 290.5
18263 | 331 331
18294 | 371.5 371.5
18322 | 413 413
18353 | 454.5 454.5
18383 | 494.5 494.5
18414 | 532.5 532.5
18444 | 570 570
----------+--------------------
Total | 295.7687 295.7687
-------------------------------
* Why have I asked for the same variable in two columns? Because when I
simplified the table to send the report ...
. tabstat row , stat(mean) by(date)
Summary for variables: row
by categories of: date
date | mean
----------+----------
May09 | 16.5
June09 | 54
July09 | 94.5
August09 | 135
September09 | 176
October09 | 217
November09 | 253.5
December09 | 290.5
January10 | 331
February10 | 371.5
March10 | 413
April10 | 454.5
May10 | 494.5
June10 | 532.5
July10 | 570
----------+----------
Total | 295.7687
---------------------
With one statistic the -by- was output formatted, but I could not get
the header and total lines to align even using -labelwidth- and
-longstub-.
Having found -tabstat- deformatted the stub, another approach was to
-egen- the group(date) values to give monthindex numbers, but they need
the month-year labels. The only way to generate labels from an existing
variable appears to be -encode-, but that sorts the values
alphabetically.
I typed in the labels -label define month 1 "May 09" etc- and attached
those to the index.
Like the bank says, "There must be a better way".
R Allan Reese
Senior statistician, Cefas
The Nothe, Weymouth DT4 8UB
Tel: +44 (0)1305 206614 -direct
Fax: +44 (0)1305 206601
www.cefas.co.uk
***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/