Benjamin A Lopman wrote:
I've got a question about getting data from our database into Stata for
analysis. Perhaps one of you has gone through a similar process.
Transferring the data itself is of course no problem through an ODBC
connection. The problem is handling the value labels. (The variable
labels are also tricky but not such a major issue). The problem is
mainly with the categorical variables and is as follows:
In Access (or any other database, as far as I am aware) the labels are
stored in a separate table. For example, 1 and 2 are stored in a data
table and the corresponding Male and Female labels are stored in a
second "lookup" table. That is not how Stata or SPSS stores labels, so
when the data is brought back into one of those packages, the value
labels are lost. I want to retain the numerical data and retain the
labels. I am concerned about this because the dataset is so large that
I don't want to have difficult to interpret numerical data when the
data is shared with other analysts.
Stat Transfer does not seem to handle value labels converting to/from
database files.
Has anyone overcome this problem? Is there a script to properly carry
the value labels into the stats package?
--------------------------------------------------------------------------------
With Microsoft Access you can handle value labels in two ways. One is via a so-
called lookup wizard that creates a lookup list at the table level. The other
way is the conventional way that relationial database management systems use,
which via use of a lookup table whose primary key is the foreign key for the
column in the table referencing it for value labels.
I'm not sure how to get the value labels if your particular Access database
uses the first method. The use of Microsoft's lookup wizard to create a lookup
list at the table level is highly discouraged, because of the all problems that
it creates down the road. (See numerous postings on the topic on
comp.databases.ms-access .)
If your particular database uses the conventional method, the one that you
describe in your post, then Phil Schumm is right in that the value labels need
to be retrieved with script that specifically uses the column names in each
case, that is, there is no practical universal script that automatically knows
that table A is a lookup table and table B is the table referencing it for
value labels.
However, the SQL statement to get the value labels is a simple join, which you
can execute from within Stata's -odbc load-, if you wanted to get the values at
once. Then it's a simple -encode- to generate value labels.
If you want to reconstruct the value labels using the same numerical
correspondence that the database uses, then you will need two trips to the
database: first to get the lookup table with its values and labels (which you
can save from Stata for use to manually reconstruct the value labels later) and
then to get the table referencing the lookup table (the data table). Then you
can construct the value labels within Stata to exactly match the number list
used in the database. Note that will be worthwhile only if the database uses
numbers and not string (characters) as the primary key-foreign key columns.
Joseph Coveney
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/