Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: ODBC vs. insheet variable format troubles (importing SQL data)

From	Kieran McCaul <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
Date	Thu, 24 Jan 2013 11:13:28 +0800

...

I was just checking that it wasn't numeric in which case you might have been encountering some sort of precision problem.


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Abe N
Sent: Thursday, 24 January 2013 10:43 AM
To: [email protected]
Subject: RE: st: ODBC vs. insheet variable format troubles (importing SQL data)

record_id was originally stored as a 19 character string, but I think the longest one in the data subset I'm working with is 15 characters.  Does that make a difference?

--- On Wed, 1/23/13, Kieran McCaul <[email protected]> wrote:

> From: Kieran McCaul <[email protected]>
> Subject: RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
> To: "[email protected]" <[email protected]>
> Date: Wednesday, January 23, 2013, 5:46 PM
> ...
> 
> 
> How big are these record ids?
> 
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> On Behalf Of Abe N
> Sent: Thursday, 24 January 2013 8:44 AM
> To: [email protected]
> Subject: Re: st: ODBC vs. insheet variable format troubles
> (importing SQL data)
> 
> Hi, thanks for the reply!  The code I'm using is:
> 
> odbc load, exec("select * from dbo.ResearchSource where
> PRINCIPAL_PROCEDURE_GROUP between 302034 and 302039")
> dsn("servername") lowercase
> 
> This pulls roughly 30k records with 160 variables. 
> Some of the variables are over 32 characters so it renames
> it to var35 for example and I manually rename them
> after.  To double check that something was messing up,
> what I did was a duplicates tag command based on patient
> psuedo id, hospital, admission date, and diagnoses. 
> Almost 10-15k of the records were some form of
> duplicate.  I then looked at a set of 12 duplicate
> records, found their unique record_id's by using tab, and
> pulled up those records in SQL using
> 
> select * from ResearchSource where (RECORD_ID=some# OR
> RECORD_ID=some# OR etc)
> 
> I then cross checked using STATA's data browser and what was
> in SQL and they did not match.  Basically the first
> row/record_id had accurate values, but then the other 11 had
> repeated values from the first row (except for record_id
> which remained unique).  Not sure why it was doing
> this.  I can't copy the output here, the security
> settings on the server I work on are quite stringent. 
> 
> 
> Lastly, I'm not sure how to check what ODBC manager I'm
> using.  Is there a way to check?  From reading the
> help file I'm guessing iODBC is what it's using because I
> haven't changed it and I'm not on unix, but not 100% sure.
> 
> 
> 
> --- On Wed, 1/23/13, Dimitriy V. Masterov <[email protected]>
> wrote:
> 
> > From: Dimitriy V. Masterov <[email protected]>
> > Subject: Re: st: ODBC vs. insheet variable format
> troubles (importing SQL data)
> > To: "Statalist" <[email protected]>
> > Date: Wednesday, January 23, 2013, 2:08 PM
> > Abe,
> > 
> > I load SQL data via odbc without any hiccups, and at
> first I
> > spent a
> > fair amount of time repeating calculations in SQL and
> Stata
> > to make
> > sure I was doing things correctly.
> > 
> > Could you replicate the error with a simple query like
> > "select top 5
> > record_id from ..."? It might help if we see the code
> and
> > the output.
> > I would recommend solving this rather than the insheet
> > approach. Also,
> > which odbc manager are you using?
> > 
> > DVM
> > 
> > On Wed, Jan 23, 2013 at 1:06 PM, Abe N <[email protected]>
> > wrote:
> > > Hello Everyone,
> > >
> > > Basically wondering if there is a way I can
> conserve
> > variable data formats when I bring data manually from
> SQL to
> > STATA using insheet (see below for more explanation
> and
> > trouble I had with ODBC).
> > >
> > > I was originally told by the people maintaining
> the SQL
> > database to use:
> > >
> > > odbc load, exec("Select command here")
> > dsn("databasename") lowercase
> > >
> > > to bring data into STATA, which seemed to be
> working
> > great for my purposes until I did a duplicates report
> and
> > found that I had way too many duplicates.  After
> > cross-checking by record_id (unique) with the original
> SQL
> > data I found the odbc command wasn't importing data
> > properly.  I had already set up the variable list
> > though (some were too long so I had to manually rename
> > them), so I used describe and I think display
> > "`r(varlist)'", copied that varlist to a text file for
> what
> > I would do in my next step using insheet.
> > >
> > > Basically, ran my SQL query in SQL Server
> Management
> > Studio 2008 R2, saved the results/data as a tab
> delimited
> > text and used:
> > >
> > > insheet (list of 160 variables) using data.txt,
> tab
> > >
> > > Which seems to be working great.  The only
> problem
> > I'm having now is that while odbc maintained the
> variable
> > data formats (string vs double, etc from SQL), this
> new
> > method doesn't and it would be much easier for me if
> it
> > didn't convert some of my strings into numbers, etc.
> > >
> > > So what I'm wondering is if there's a way I can
> sort of
> > generate the variables, their data formats, and labels
> using
> > my original odbc method, clear out that incorrect data,
> and
> > then pull in data from the text file afterward?  If
> > not, is there a way I can carry over at least the
> variable
> > labels quickly rather than manually doing all 160 or
> so?
> > >
> > > Sorry for the long winded explanation, but hope
> the
> > situation/question gets across clearly.
> > >
> > > Best,
> > > Abe Noorbakhsh
> > > [email protected]
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/help.cgi?search
> > > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > > *   http://www.ats.ucla.edu/stat/stata/
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > *   http://www.ats.ucla.edu/stat/stata/
> > 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
  - From: Kieran McCaul <[email protected]>
- RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
  - From: Abe N <[email protected]>

Prev by Date: Re: st: Command line syntax for optional and required numeric statements
Next by Date: st: How to save results as .dat file
Previous by thread: RE: st: ODBC vs. insheet variable format troubles (importing SQL data)
Next by thread: st: Select subset.
Index(es):
- Date
- Thread