Kevin Turner wrote:
Eva Poen ([email protected]) writes:
Symptoms: If a variable in the spreadsheet contains a certain number
of missings, it will be imported with _all_ values set to missing,
even if it has non-missing values in Excel. I work under Windows XP
SP2, Stata 9.2, and Excel 2003.
This is an issue with the ODBC Excel driver. This is what I know:
Excel is not a database, but a spreadsheet. This means every Excel cell in a
column can have a different type. Note: Excel 'Formats' are not
representative of Excel's underlying cell type. Your blank/missing cell is
not necessarily a blank number or a blank string, just a blank. Therefore,
when ODBC looks at a column, it needs to determine a type for the column.
That's a requirement for ODBC and databases in general. To solve this
problem, the driver apparently takes a sample of values from which to guess
the column type. This is where your magic number applies. In your case, if
it determines that the column is of blank type (unknown), it simply defaults
to string.
What comes next is really an unfortunate feature of the ODBC-Excel driver.
When it does finally decide on a type for the whole column, it will only
truly import cells that EXACTLY match that type. It does no type conversion.
For non-conforming types, Stata either gets an empty string, or a NULL
value, which we handle accordingly. In your case, when the driver defaulted
to string type for the column, all the numeric values were seen as blank
strings to Stata.
All of this is to say that the best solution might be to avoid Excel if you
know your columns have mixed types, or a lot of blank cells. Excel was never
designed to be a database even though people (myself included) use it as a
crude one.
--------------------------------------------------------------------------------
There's a workaround that I routinely use whenever I'm faced with this
problem of mixed data types in an Excel worksheet. You'll find it described
at
http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/ .
Scan down the blog to where he starts talking about ODBC. You'll need to
use the -odbc load, exec()- syntax of Stata's -odbc- command, because the
trick involves referring to your table with the DSN-like reference imbedded
in an SQL statement. An example is given in the blog and I reproduce it
here in order to show the syntax:
SELECT MyMixedCol FROM [Excel
8.0;HDR=YES;IMEX=1;Database=C:\Tempo\db.xls;].[Sheet1$];
This would be placed (in double quotation marks and with your Excel
workbook's information) inside the -exec()- option of the Stata command, so
that it would look something like:
local dir C:\Data/
local dsn Excel Files;DBQ=`dir'MyExcelWorkbook.xls;
odbc query "`dsn'"
odbc describe "Sheet1$"
local worksheet `dir'MyExcelWorkbook.xls;].[Sheet1$]
odbc load, ///
exec("SELECT * FROM [Excel 8.0;HDR=YES;IMEX=1;Database=`worksheet';")
[That last line probably got wrapped during e-mail processing, and so you'll
need to unwrap it manually. The contents of the -exec()- option cannot be
split with triple backslashes in actual use. I tried to put some of the
contents into the local macro *worksheet* in order to avoid wrapping, but it
might not have been successful.]
What it does is to import any mixed-data (string/numeric mix) as string.
You can then -destring- once your data are safely inside Stata.
The trick depends upon a certain configuration of a particular setting
(IMEX) of the Windows registry, but the configuration is the default, and so
shouldn't be a problem for the typical user.
I concur wholeheartedly with Kevin, however, about avoiding using Excel
whenever possible.
Joseph Coveney
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/