Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: AW: Dates, strings, formats & labels: tab / tabstat / table
From
"Lachenbruch, Peter" <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: AW: Dates, strings, formats & labels: tab / tabstat / table
Date
Mon, 13 Sep 2010 09:47:00 -0700
I had a similar problem arise recently. Various dates of events (diagnosis, first weakness, first rash, first...) were recorded as MDY, but many were simply guesses as to the day, so were recorded as 1, 15, 30 - decided digit preference. When we looked at some circular statistics (courtesy of NJC's programs) they were decidedly not uniform throughout the year. We decided to group this by month and look at the graphs and tests and they became much more uniform. The command I used was gen mon=month(datevar,"MDY") and it worked like a charm.
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Martin Weiss
Sent: Monday, September 13, 2010 5:48 AM
To: [email protected]
Subject: st: AW: Dates, strings, formats & labels: tab / tabstat / table
<>
You need to use the -monthly- function to get this right. Your "date" is
formatted "%tdMY" which is not what you want and leads to all the confusion:
*************
clear*
inp str6 month
"APR 10"
"AUG 09"
"DEC 09"
"FEB 10"
"JAN 10"
"JUL 09"
"JUL 10"
"JUN 09"
"JUN 10"
"MAR 10"
"MAY 09"
"MAY 10"
"NOV 09"
"OCT 09"
"SEP 09"
end
gen mymonthlydate=monthly(month, "M20Y")
format mymonthlydate %tmMon_CCYY
l
de
*************
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Allan Reese
(Cefas)
Gesendet: Montag, 13. September 2010 13:30
An: [email protected]
Betreff: st: Dates, strings, formats & labels: tab / tabstat / table
This is the type of behaviour that puts people off computers and causes
them to think folk who promote Stata are nerds - maybe I am for thinking
it worth writing up ;-)
The objective: summarize a data table as monthly statistics - in time
order. Usually the data comes as a spreadsheet with the date as a
string. I always recommend writing the month as a name to avoid
ambiguity of day and month.
Today's detonator: having converted the "month year" string to a date,
the following commands gave irritatingly inconsistent interpretations.
* Here's the input string variable, which sorts alphabetically
. tab month
month | Freq. Percent Cum.
------------+-----------------------------------
APR 10 | 38 6.92 6.92
AUG 09 | 37 6.74 13.66
DEC 09 | 38 6.92 20.58
FEB 10 | 38 6.92 27.50
JAN 10 | 37 6.74 34.24
JUL 09 | 38 6.92 41.17
JUL 10 | 37 6.74 47.91
JUN 09 | 37 6.74 54.64
JUN 10 | 34 6.19 60.84
MAR 10 | 39 7.10 67.94
MAY 09 | 32 5.83 73.77
MAY 10 | 38 6.92 80.69
NOV 09 | 30 5.46 86.16
OCT 09 | 37 6.74 92.90
SEP 09 | 39 7.10 100.00
------------+-----------------------------------
Total | 549 100.00
* -gen- a date and format it %tdMY
. desc date
storage display value
variable name type format label variable label
------------------------------------------------------------------------
-----------------------------------------------
date float %tdMY
* -tab- insists it was the 1st of each month
. tab date
date | Freq. Percent Cum.
------------+-----------------------------------
01may2009 | 32 5.83 5.83
01jun2009 | 37 6.74 12.57
01jul2009 | 38 6.92 19.49
01aug2009 | 37 6.74 26.23
01sep2009 | 39 7.10 33.33
01oct2009 | 37 6.74 40.07
01nov2009 | 30 5.46 45.54
01dec2009 | 38 6.92 52.46
01jan2010 | 37 6.74 59.20
01feb2010 | 38 6.92 66.12
01mar2010 | 39 7.10 73.22
01apr2010 | 38 6.92 80.15
01may2010 | 38 6.92 87.07
01jun2010 | 34 6.19 93.26
01jul2010 | 37 6.74 100.00
------------+-----------------------------------
Total | 549 100.00
* though -table- can handle the concept ...
. table date
------------------------
date | Freq.
------------+-----------
May09 | 32
June09 | 37
July09 | 38
August09 | 37
September09 | 39
October09 | 37
November09 | 30
December09 | 38
January10 | 37
February10 | 38
March10 | 39
April10 | 38
May10 | 38
June10 | 34
July10 | 37
------------------------
* but -table- will handle only five statistics at a time, and I wanted
six. So tried -tabstat- and got a shock
. tabstat row row , stat(mean) by(date)
Summary statistics: mean
by categories of: date
date | row row
----------+--------------------
18018 | 16.5 16.5
18049 | 54 54
18079 | 94.5 94.5
18110 | 135 135
18141 | 176 176
18171 | 217 217
18202 | 253.5 253.5
18232 | 290.5 290.5
18263 | 331 331
18294 | 371.5 371.5
18322 | 413 413
18353 | 454.5 454.5
18383 | 494.5 494.5
18414 | 532.5 532.5
18444 | 570 570
----------+--------------------
Total | 295.7687 295.7687
-------------------------------
* Why have I asked for the same variable in two columns? Because when I
simplified the table to send the report ...
. tabstat row , stat(mean) by(date)
Summary for variables: row
by categories of: date
date | mean
----------+----------
May09 | 16.5
June09 | 54
July09 | 94.5
August09 | 135
September09 | 176
October09 | 217
November09 | 253.5
December09 | 290.5
January10 | 331
February10 | 371.5
March10 | 413
April10 | 454.5
May10 | 494.5
June10 | 532.5
July10 | 570
----------+----------
Total | 295.7687
---------------------
With one statistic the -by- was output formatted, but I could not get
the header and total lines to align even using -labelwidth- and
-longstub-.
Having found -tabstat- deformatted the stub, another approach was to
-egen- the group(date) values to give monthindex numbers, but they need
the month-year labels. The only way to generate labels from an existing
variable appears to be -encode-, but that sorts the values
alphabetically.
I typed in the labels -label define month 1 "May 09" etc- and attached
those to the index.
Like the bank says, "There must be a better way".
R Allan Reese
Senior statistician, Cefas
The Nothe, Weymouth DT4 8UB
Tel: +44 (0)1305 206614 -direct
Fax: +44 (0)1305 206601
www.cefas.co.uk
****************************************************************************
*******
This email and any attachments are intended for the named recipient only.
Its unauthorised use, distribution, disclosure, storage or copying is not
permitted. If you have received it in error, please destroy all copies and
notify the sender. In messages of a non-business nature, the views and
opinions expressed are the author's own and do not necessarily reflect those
of the organisation from which it is sent. All emails may be subject to
monitoring.
****************************************************************************
*******
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/