Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
From
Abe N <[email protected]>
To
[email protected]
Subject
RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
Date
Wed, 23 Jan 2013 18:42:44 -0800 (PST)
record_id was originally stored as a 19 character string, but I think the longest one in the data subset I'm working with is 15 characters. Does that make a difference?
--- On Wed, 1/23/13, Kieran McCaul <[email protected]> wrote:
> From: Kieran McCaul <[email protected]>
> Subject: RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
> To: "[email protected]" <[email protected]>
> Date: Wednesday, January 23, 2013, 5:46 PM
> ...
>
>
> How big are these record ids?
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> On Behalf Of Abe N
> Sent: Thursday, 24 January 2013 8:44 AM
> To: [email protected]
> Subject: Re: st: ODBC vs. insheet variable format troubles
> (importing SQL data)
>
> Hi, thanks for the reply! The code I'm using is:
>
> odbc load, exec("select * from dbo.ResearchSource where
> PRINCIPAL_PROCEDURE_GROUP between 302034 and 302039")
> dsn("servername") lowercase
>
> This pulls roughly 30k records with 160 variables.
> Some of the variables are over 32 characters so it renames
> it to var35 for example and I manually rename them
> after. To double check that something was messing up,
> what I did was a duplicates tag command based on patient
> psuedo id, hospital, admission date, and diagnoses.
> Almost 10-15k of the records were some form of
> duplicate. I then looked at a set of 12 duplicate
> records, found their unique record_id's by using tab, and
> pulled up those records in SQL using
>
> select * from ResearchSource where (RECORD_ID=some# OR
> RECORD_ID=some# OR etc)
>
> I then cross checked using STATA's data browser and what was
> in SQL and they did not match. Basically the first
> row/record_id had accurate values, but then the other 11 had
> repeated values from the first row (except for record_id
> which remained unique). Not sure why it was doing
> this. I can't copy the output here, the security
> settings on the server I work on are quite stringent.
>
>
> Lastly, I'm not sure how to check what ODBC manager I'm
> using. Is there a way to check? From reading the
> help file I'm guessing iODBC is what it's using because I
> haven't changed it and I'm not on unix, but not 100% sure.
>
>
>
> --- On Wed, 1/23/13, Dimitriy V. Masterov <[email protected]>
> wrote:
>
> > From: Dimitriy V. Masterov <[email protected]>
> > Subject: Re: st: ODBC vs. insheet variable format
> troubles (importing SQL data)
> > To: "Statalist" <[email protected]>
> > Date: Wednesday, January 23, 2013, 2:08 PM
> > Abe,
> >
> > I load SQL data via odbc without any hiccups, and at
> first I
> > spent a
> > fair amount of time repeating calculations in SQL and
> Stata
> > to make
> > sure I was doing things correctly.
> >
> > Could you replicate the error with a simple query like
> > "select top 5
> > record_id from ..."? It might help if we see the code
> and
> > the output.
> > I would recommend solving this rather than the insheet
> > approach. Also,
> > which odbc manager are you using?
> >
> > DVM
> >
> > On Wed, Jan 23, 2013 at 1:06 PM, Abe N <[email protected]>
> > wrote:
> > > Hello Everyone,
> > >
> > > Basically wondering if there is a way I can
> conserve
> > variable data formats when I bring data manually from
> SQL to
> > STATA using insheet (see below for more explanation
> and
> > trouble I had with ODBC).
> > >
> > > I was originally told by the people maintaining
> the SQL
> > database to use:
> > >
> > > odbc load, exec("Select command here")
> > dsn("databasename") lowercase
> > >
> > > to bring data into STATA, which seemed to be
> working
> > great for my purposes until I did a duplicates report
> and
> > found that I had way too many duplicates. After
> > cross-checking by record_id (unique) with the original
> SQL
> > data I found the odbc command wasn't importing data
> > properly. I had already set up the variable list
> > though (some were too long so I had to manually rename
> > them), so I used describe and I think display
> > "`r(varlist)'", copied that varlist to a text file for
> what
> > I would do in my next step using insheet.
> > >
> > > Basically, ran my SQL query in SQL Server
> Management
> > Studio 2008 R2, saved the results/data as a tab
> delimited
> > text and used:
> > >
> > > insheet (list of 160 variables) using data.txt,
> tab
> > >
> > > Which seems to be working great. The only
> problem
> > I'm having now is that while odbc maintained the
> variable
> > data formats (string vs double, etc from SQL), this
> new
> > method doesn't and it would be much easier for me if
> it
> > didn't convert some of my strings into numbers, etc.
> > >
> > > So what I'm wondering is if there's a way I can
> sort of
> > generate the variables, their data formats, and labels
> using
> > my original odbc method, clear out that incorrect data,
> and
> > then pull in data from the text file afterward? If
> > not, is there a way I can carry over at least the
> variable
> > labels quickly rather than manually doing all 160 or
> so?
> > >
> > > Sorry for the long winded explanation, but hope
> the
> > situation/question gets across clearly.
> > >
> > > Best,
> > > Abe Noorbakhsh
> > > [email protected]
> > > *
> > > * For searches and help try:
> > > * http://www.stata.com/help.cgi?search
> > > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > > * http://www.ats.ucla.edu/stat/stata/
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
> >
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/