Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Joseph Coveney" <jcoveney@bigplanet.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: Re: odbc load from SQL if ID is in my Stata master dataset? |
Date | Tue, 18 Sep 2012 21:35:53 +0900 |
Jen Zhen wrote: I have a list of person ID numbers for whom I would like to merge in information from a large SQL dataset (containing the IDs in the first list, plus many more). I assume that I cannot directly merge in from the SQL dataset, but must first load the SQL dataset into Stata, save as dta, and then I can merge that into my main dataset. However, I'm wondering how, when loading from SQL with --odbc load-- I can best specify that I'm only interested in observations with a fixed list of ID numbers? What would work is to write "WHERE id=51 OR id=332 OR ...", but since I have several hundred valid IDs this seems inefficient. Would anyone happen to know a better way? -------------------------------------------------------------------------------- Have you considered creating a temporary table in the database, uploading the Stata dataset of IDs into it (using -odbc insert-), and finally -odbc load, exec()- where the exec()'s SQL statement INNER JOINs the SQL table and your temporary table of IDs ON SQLdataset.ID = TemporaryStataTable.ID? This would probably be the most efficient and cleanest. On the downside, if it's a so-called production database, then you very likely don't have sufficient data definition language privileges to go this route. (You might want to talk to your database administrator and see whether you can get privileges to create, populate and drop temporary tables in an out-of-the way location or schema inside the database.) I doubt that the string length limits of ODBC's SQL statements will allow either your WHERE clause or the following, but you can try something like WHERE ID IN () as a shorthand for your WHERE clause, assuming that ID's a primary key and so never NULL. local SQL_statement SELECT * FROM MyTable WHERE ID IN ( drop if missing(ID) quietly count forvalues row = 1/`=`r(N)'-1' { local SQL_statement `SQL_statement' `=ID[`row']', } local SQL_statement `SQL_statement' `=ID[`r(N)']'); odbc load, exec("`SQL_statement'") . . . I recall StataCorp's mentioning an inside-the-database version of Stata. I'm not sure how that might work for you, but it's something worth looking into, too. Joseph Coveney * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/