Connect to SQL Server From R
Connect to SQL Server from R
What this article is about
Most of today’s data is stored in relational databases and R needs a way to access that data.
While you are like to get lot of data from data files like comma separated or tab separated files, many times your business data might be stored in a relational database like SQL Server or Oracle or MySQL. At times, your company may copy this data in your desired format like comma separated files and give it to you.
But many times, they can simply give you read access to the database and ask you to get that data from the database directly. In such a scenario, you need to be able to connect to the relational database storing the data. You also need to have at least some basic knowledge about Relational databases and T-SQL like Select , joins , where clause etc. Basically you need to be able to write SQL queries.
This article will just demonstrate how to connect to SQL Server from R and execute a simple query.
This is not an article about writing SQL queries.
What you need
To Connect to SQL Server, you need to know 3 bits of information even before you start.
- IP address of SQL Server:
- User name to connect to SQL Server
- Password for the user to connect
You can ask this information from the team of DBAs that manage the servers on which the data is residing.
Once you have this information ready, now we can start setting up our connection to SQL Server from R.
RODBC Package
There are many packages that help you connect to relational databases in R. For this demonstration, I am using a package named RODBC. First you need to install this package.
install.packages("RODBC")
Make sure that this installation is completed without any error. You can also confirm if that package has installed correctly or not by running the following command.
library(RODBC)
You should see the command prompt without any error. If this happens it means the library is loaded in R. As you can see that it is an ODBC driver for connecting to relational database. The same library can be used to connect to basically any data source that supports ODBC connections.
Create a data source in Windows
So to connect to SQL Server, first we will create a data source (DSN) in windows.
(I do understand that many people might be using R in Linux. I am afraid since I don’t have Linux machine myself, I can’t do this for Linux users.)
To create a data source, click on Start à Administrative Tools à Data Sources (ODBC)
This will open the following window where you will be able to create the ODBC data source that points to the SQL Server hosting the data.
As you can see, I have highlighted in yellow a data source that I have already created. I have named it as UAT.
Click on Add to create a new data source and select the option shown in the screen shot below and click on finish.
In the next window, you need to fill two fields. One is the name for this data source. This will be used in R. In my case, I have this data source name of UAT.
Description can be anything.
Server is the most important thing. This is where you enter the IP address of the SQL Server where the data is hosted. Please note that for confidentiality purpose, I have erased the IP address that I entered when I create this data source.
Once you have entered this information, click on Next.
On this screen, you need to tell the data source how to connect to SQL Server. This is where you will need to specify the user name and the password for the SQL Server that you may have got from the DBA.
Once again, I have scrubbed out my user ID from the next screen shot. Click on next after entering the user name and the password.
On this screen you don’t need to change anything and just click on next.
On the next screen too, don’t make any change and just click on next. Now you will come to the final screen.
Here you will see UAT instead of UST in my case. Since I already had a data source named, I had to change the name to UST. Also, in the server field marked in yellow in screen shot, I have scrubbed out the IP address of my Server. In your case, you will see the IP address of the server you entered at the start here.
Now to make sure that the data source is working, click on Test Data Source button. If everything goes right, you should see a screen like this.
Use DSN in R
Well, this was the hard part. Now, we just have to use this data source (DSN in R) to connect to it and run a simple query.
Now we will run the next commands in R. This first command will create a connection object pointing to the DSN that we created earlier. We need to use the same name UAT that we used for the DSN.
uat_conn = odbcConnect("UAT")
In the second step, we will use the connection object created in the previous step and get data from SQL server.
In the second step, we will use the connection object created in the previous step and get data from SQL server. This is the query that I am going to run. This will give me the list of databases from the server.
sqlQuery(uat_conn,"select name from master.sys.sysdatabases where dbid > 4")
Here the query that I am running is enclosed in quotes.
The data source against which this query needs to be run is uat_conn which we had created in the first step. So basically this R command connects to the data source (DSN) object named uat_conn and runs a query. The query is as shown below
select name from master.sys.sysdatabases where dbid > 4
Once you run this in R, you should see the following screen shot.
Summary
So this command from R will connect to the DSN name UAT (This points to the SQL Server which you used when you created the DSN) , gets a list of databases from that server and display it in R.
This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.
© 2018 Harshvardhan Diwan
Comments
No comments yet.