Connecting Stata with databases has gotten even easier. jdbc allows us to exchange data with some of the most popular database vendors such as Oracle, MySQL, Amazon Redshift, Snowflake, Microsoft SQL Server, and much more. What's great about jdbc is that it's a cross-platform solution, so our JDBC setup works the same way for Windows, Mac, and Unix systems. Once you install a JDBC driver, that driver and your Stata code are all you need to switch from, say, your Mac laptop to your company's Windows cloud systems.
We have email data stored on Amazon Web Services in a Redshift cluster, and we need to load these data into Stata. We first log in to AWS and go to the Amazon Redshift configuration page to download the correct JDBC driver and get the correct connection information. We then place the downloaded JDBC JAR file along our Stata adopath. Now in the Stata Do-file Editor, we store our connection information by typing
. local jar "redshift-jdbc42-2.0.0.0.jar" . local driverc "com.amazon.redshift.jdbc42.Driver" . local url "jdbc:redshift://redshift-cluster-1.cziajbxjzi3e.us-west-2.redshift.amazonaws.com:5439/emails" . local user "admin" . local pass "secret" . jdbc connect, jar("`jar'") driverclass("`driverc'") url("`url'") user("`user'") password("`pass'")
If these database settings need to be used by others or you just want to make remembering them easier, we can store them by typing
. local jar "redshift-jdbc42-2.0.0.0.jar" . local driverc "com.amazon.redshift.jdbc42.Driver" . local url "jdbc:redshift://redshift-cluster-1.cziajbxjzi3e.us-west-2.redshift.amazonaws.com:5439/emails" . local user "admin" . local pass "secret" . jdbc add MyRed, jar("`jar'") driverclass("`driverc'") url("`url'") user("`user'") password("`pass'")
We can now add the above commands to profile.do to save these connection settings in between Stata sessions, and we now can connect to our Redshift database by typing
. jdbc connect MyRed
To see what tables are availiable to load from our connection, we type
. jdbc showtables
Database: emails |
Tables |
category |
response_info |
employees |
We can describe a table by typing
. jdbc describe response_info
Table: response_info | |
Column name | Column type |
id BIGINT UNSIGNED filename VARCHAR category_id BIGINT UNSIGNED employee_id BIGINT UNSIGNED datein TIMESTAMP dateout DATE screendate TIMESTAMP rid TEXT keywords TEXT assigntime TIMESTAMP resptime TIMESTAMP timeadded TIMESTAMP sversion DOUBLE correct BIT timetouched TIMESTAMP timemailed TIMESTAMP |
To load the data, we type
. jdbc load, table("response_info") clear (128 observations loaded)
Now we have a Stata dataset and can perform our analysis!
Learn more about setting up a JDBC DSN, executing SQL, loading data, and inserting data with in-depth examples in the Stata Data Management Reference Manual; see [D] jdbc.