Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: AW: Dates, strings, formats & labels: tab / tabstat / table |
Date | Mon, 13 Sep 2010 09:47:00 -0700 |
I had a similar problem arise recently. Various dates of events (diagnosis, first weakness, first rash, first...) were recorded as MDY, but many were simply guesses as to the day, so were recorded as 1, 15, 30 - decided digit preference. When we looked at some circular statistics (courtesy of NJC's programs) they were decidedly not uniform throughout the year. We decided to group this by month and look at the graphs and tests and they became much more uniform. The command I used was gen mon=month(datevar,"MDY") and it worked like a charm. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss Sent: Monday, September 13, 2010 5:48 AM To: statalist@hsphsun2.harvard.edu Subject: st: AW: Dates, strings, formats & labels: tab / tabstat / table <> You need to use the -monthly- function to get this right. Your "date" is formatted "%tdMY" which is not what you want and leads to all the confusion: ************* clear* inp str6 month "APR 10" "AUG 09" "DEC 09" "FEB 10" "JAN 10" "JUL 09" "JUL 10" "JUN 09" "JUN 10" "MAR 10" "MAY 09" "MAY 10" "NOV 09" "OCT 09" "SEP 09" end gen mymonthlydate=monthly(month, "M20Y") format mymonthlydate %tmMon_CCYY l de ************* HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Allan Reese (Cefas) Gesendet: Montag, 13. September 2010 13:30 An: statalist@hsphsun2.harvard.edu Betreff: st: Dates, strings, formats & labels: tab / tabstat / table This is the type of behaviour that puts people off computers and causes them to think folk who promote Stata are nerds - maybe I am for thinking it worth writing up ;-) The objective: summarize a data table as monthly statistics - in time order. Usually the data comes as a spreadsheet with the date as a string. I always recommend writing the month as a name to avoid ambiguity of day and month. Today's detonator: having converted the "month year" string to a date, the following commands gave irritatingly inconsistent interpretations. * Here's the input string variable, which sorts alphabetically . tab month month | Freq. Percent Cum. ------------+----------------------------------- APR 10 | 38 6.92 6.92 AUG 09 | 37 6.74 13.66 DEC 09 | 38 6.92 20.58 FEB 10 | 38 6.92 27.50 JAN 10 | 37 6.74 34.24 JUL 09 | 38 6.92 41.17 JUL 10 | 37 6.74 47.91 JUN 09 | 37 6.74 54.64 JUN 10 | 34 6.19 60.84 MAR 10 | 39 7.10 67.94 MAY 09 | 32 5.83 73.77 MAY 10 | 38 6.92 80.69 NOV 09 | 30 5.46 86.16 OCT 09 | 37 6.74 92.90 SEP 09 | 39 7.10 100.00 ------------+----------------------------------- Total | 549 100.00 * -gen- a date and format it %tdMY . desc date storage display value variable name type format label variable label ------------------------------------------------------------------------ ----------------------------------------------- date float %tdMY * -tab- insists it was the 1st of each month . tab date date | Freq. Percent Cum. ------------+----------------------------------- 01may2009 | 32 5.83 5.83 01jun2009 | 37 6.74 12.57 01jul2009 | 38 6.92 19.49 01aug2009 | 37 6.74 26.23 01sep2009 | 39 7.10 33.33 01oct2009 | 37 6.74 40.07 01nov2009 | 30 5.46 45.54 01dec2009 | 38 6.92 52.46 01jan2010 | 37 6.74 59.20 01feb2010 | 38 6.92 66.12 01mar2010 | 39 7.10 73.22 01apr2010 | 38 6.92 80.15 01may2010 | 38 6.92 87.07 01jun2010 | 34 6.19 93.26 01jul2010 | 37 6.74 100.00 ------------+----------------------------------- Total | 549 100.00 * though -table- can handle the concept ... . table date ------------------------ date | Freq. ------------+----------- May09 | 32 June09 | 37 July09 | 38 August09 | 37 September09 | 39 October09 | 37 November09 | 30 December09 | 38 January10 | 37 February10 | 38 March10 | 39 April10 | 38 May10 | 38 June10 | 34 July10 | 37 ------------------------ * but -table- will handle only five statistics at a time, and I wanted six. So tried -tabstat- and got a shock . tabstat row row , stat(mean) by(date) Summary statistics: mean by categories of: date date | row row ----------+-------------------- 18018 | 16.5 16.5 18049 | 54 54 18079 | 94.5 94.5 18110 | 135 135 18141 | 176 176 18171 | 217 217 18202 | 253.5 253.5 18232 | 290.5 290.5 18263 | 331 331 18294 | 371.5 371.5 18322 | 413 413 18353 | 454.5 454.5 18383 | 494.5 494.5 18414 | 532.5 532.5 18444 | 570 570 ----------+-------------------- Total | 295.7687 295.7687 ------------------------------- * Why have I asked for the same variable in two columns? Because when I simplified the table to send the report ... . tabstat row , stat(mean) by(date) Summary for variables: row by categories of: date date | mean ----------+---------- May09 | 16.5 June09 | 54 July09 | 94.5 August09 | 135 September09 | 176 October09 | 217 November09 | 253.5 December09 | 290.5 January10 | 331 February10 | 371.5 March10 | 413 April10 | 454.5 May10 | 494.5 June10 | 532.5 July10 | 570 ----------+---------- Total | 295.7687 --------------------- With one statistic the -by- was output formatted, but I could not get the header and total lines to align even using -labelwidth- and -longstub-. Having found -tabstat- deformatted the stub, another approach was to -egen- the group(date) values to give monthindex numbers, but they need the month-year labels. The only way to generate labels from an existing variable appears to be -encode-, but that sorts the values alphabetically. I typed in the labels -label define month 1 "May 09" etc- and attached those to the index. Like the bank says, "There must be a better way". R Allan Reese Senior statistician, Cefas The Nothe, Weymouth DT4 8UB Tel: +44 (0)1305 206614 -direct Fax: +44 (0)1305 206601 www.cefas.co.uk **************************************************************************** ******* This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring. **************************************************************************** ******* * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/