Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Stata bug? - Mixture of European/US dates using import excel
From
Phil Schumm <[email protected]>
To
Statalist Statalist <[email protected]>
Subject
Re: st: Stata bug? - Mixture of European/US dates using import excel
Date
Fri, 7 Feb 2014 16:45:07 -0600
On Feb 7, 2014, at 3:52 PM, Phillips, Patrick <[email protected]> wrote:
> If a spreadsheet has a column with a mixture of dates and strings, Stata will read this column in as a string when using the import excel command.
Correct -- what else could it do? If it reads the column as numeric, then it cannot preserve the string values.
> If a date is stored in excel as 'text' it is read into Stata directly as string.
Again, that makes sense. If the column is stored in Excel as text, Stata takes it at its word. I wouldn't want Stata looking at the data and trying to guess what the string values represent (Excel does this, and it can lead to serious errors, e.g., http://www.theregister.co.uk/2004/07/16/excel_vanishing_dna/).
> However, if a date is stored in excel as a 'date' it is converted to mm/dd/yyyy format and read into Stata as a string.
Yes, if Stata determines that a column contains text values, then it must read that column as text, translating any numeric values it finds into a reasonable text representation. For dates, it does this by converting the date into the human readable format %tdnn/dd/CCYY.
> You can see that if dates are formatted in excel as dd/mm/yyyy, they will then be read into Stata in a mixture of mm/dd/yyyy and dd/mm/yyyy formats depending on whether excel 'thinks' it is text or a date!
>
> Try copying the following three lines into excel (the white space at the front of the second row is important) and saving as an excel spreadsheet.
>
> 20/02/2013
> 20/02/2013
> Text
>
> When I read this into stata it looks like the following:
>
> +-------------+
> | A |
> |-------------|
> 1. | 2/20/2013 |
> 2. | 20/02/2013 |
> 3. | Text |
> +-------------+
>
> Very worrying as it is impossible to tease out which might be dd/mm/yyyy and which mm/dd/yyyy! Has anyone else had this same problem? Does anyone have any solutions?
This is exactly as expected. In your example, the first value is stored in Excel as a date, so Stata converts it to a string representation of that date. The next two values are strings (the space causes " 20/02/2013" to be interpreted by Excel as a string), and are read by Stata exactly as-is.
In sum, this isn't a bug, but rather merely a function of the fact that Stata treats an entire column/variable as a uniform data type (i.e., numeric or string) whereas in Excel, the cells within a column are unrestricted. If you are receiving Excel files in which string and numeric (including date) values are mixed within a single column, then you need to be *very* careful when importing them into Stata (the same would be true of R, Python, etc.). Best to see if you can change the way in which the original (raw) Excel files are being generated, if possible.
-- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/