Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Allan Reese (Cefas)" <allan.reese@cefas.co.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: Dates, strings, formats & labels: tab / tabstat / table |
Date | Mon, 13 Sep 2010 12:30:18 +0100 |
This is the type of behaviour that puts people off computers and causes them to think folk who promote Stata are nerds - maybe I am for thinking it worth writing up ;-) The objective: summarize a data table as monthly statistics - in time order. Usually the data comes as a spreadsheet with the date as a string. I always recommend writing the month as a name to avoid ambiguity of day and month. Today's detonator: having converted the "month year" string to a date, the following commands gave irritatingly inconsistent interpretations. * Here's the input string variable, which sorts alphabetically . tab month month | Freq. Percent Cum. ------------+----------------------------------- APR 10 | 38 6.92 6.92 AUG 09 | 37 6.74 13.66 DEC 09 | 38 6.92 20.58 FEB 10 | 38 6.92 27.50 JAN 10 | 37 6.74 34.24 JUL 09 | 38 6.92 41.17 JUL 10 | 37 6.74 47.91 JUN 09 | 37 6.74 54.64 JUN 10 | 34 6.19 60.84 MAR 10 | 39 7.10 67.94 MAY 09 | 32 5.83 73.77 MAY 10 | 38 6.92 80.69 NOV 09 | 30 5.46 86.16 OCT 09 | 37 6.74 92.90 SEP 09 | 39 7.10 100.00 ------------+----------------------------------- Total | 549 100.00 * -gen- a date and format it %tdMY . desc date storage display value variable name type format label variable label ------------------------------------------------------------------------ ----------------------------------------------- date float %tdMY * -tab- insists it was the 1st of each month . tab date date | Freq. Percent Cum. ------------+----------------------------------- 01may2009 | 32 5.83 5.83 01jun2009 | 37 6.74 12.57 01jul2009 | 38 6.92 19.49 01aug2009 | 37 6.74 26.23 01sep2009 | 39 7.10 33.33 01oct2009 | 37 6.74 40.07 01nov2009 | 30 5.46 45.54 01dec2009 | 38 6.92 52.46 01jan2010 | 37 6.74 59.20 01feb2010 | 38 6.92 66.12 01mar2010 | 39 7.10 73.22 01apr2010 | 38 6.92 80.15 01may2010 | 38 6.92 87.07 01jun2010 | 34 6.19 93.26 01jul2010 | 37 6.74 100.00 ------------+----------------------------------- Total | 549 100.00 * though -table- can handle the concept ... . table date ------------------------ date | Freq. ------------+----------- May09 | 32 June09 | 37 July09 | 38 August09 | 37 September09 | 39 October09 | 37 November09 | 30 December09 | 38 January10 | 37 February10 | 38 March10 | 39 April10 | 38 May10 | 38 June10 | 34 July10 | 37 ------------------------ * but -table- will handle only five statistics at a time, and I wanted six. So tried -tabstat- and got a shock . tabstat row row , stat(mean) by(date) Summary statistics: mean by categories of: date date | row row ----------+-------------------- 18018 | 16.5 16.5 18049 | 54 54 18079 | 94.5 94.5 18110 | 135 135 18141 | 176 176 18171 | 217 217 18202 | 253.5 253.5 18232 | 290.5 290.5 18263 | 331 331 18294 | 371.5 371.5 18322 | 413 413 18353 | 454.5 454.5 18383 | 494.5 494.5 18414 | 532.5 532.5 18444 | 570 570 ----------+-------------------- Total | 295.7687 295.7687 ------------------------------- * Why have I asked for the same variable in two columns? Because when I simplified the table to send the report ... . tabstat row , stat(mean) by(date) Summary for variables: row by categories of: date date | mean ----------+---------- May09 | 16.5 June09 | 54 July09 | 94.5 August09 | 135 September09 | 176 October09 | 217 November09 | 253.5 December09 | 290.5 January10 | 331 February10 | 371.5 March10 | 413 April10 | 454.5 May10 | 494.5 June10 | 532.5 July10 | 570 ----------+---------- Total | 295.7687 --------------------- With one statistic the -by- was output formatted, but I could not get the header and total lines to align even using -labelwidth- and -longstub-. Having found -tabstat- deformatted the stub, another approach was to -egen- the group(date) values to give monthindex numbers, but they need the month-year labels. The only way to generate labels from an existing variable appears to be -encode-, but that sorts the values alphabetically. I typed in the labels -label define month 1 "May 09" etc- and attached those to the index. Like the bank says, "There must be a better way". R Allan Reese Senior statistician, Cefas The Nothe, Weymouth DT4 8UB Tel: +44 (0)1305 206614 -direct Fax: +44 (0)1305 206601 www.cefas.co.uk *********************************************************************************** This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring. *********************************************************************************** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/