Azure has recently added the ability to authenticate to Azure SQL Database and Azure SQL Data Warehouse using Azure Active Directory. This allows you to centrally manage identity to your database.

Azure Active Directory (AAD) authentication

As per Microsoft documentation, Azure Active Directory authentication is a mechanism of connecting to Microsoft Azure SQL Data Warehouse and Azure SQL Database by using identities in Azure Active Directory (Azure AD). With Azure Active Directory authentication, you can centrally manage the identities of database users and other Microsoft services in one central location. Central ID management provides a single place to manage SQL Data Warehouse users and simplifies permission management.

See here for a great document explaining the details of using Azure Active Directory Authentication for authentication with SQL

Why Service Principal?

Authenticating with your user name is practical when doing analysis and pulling data from external database sources, but not so much when you want to operationalize your pipeline. This is where an Azure Active Directory application registration (also called service principal) can be used to user accounts from execution accounts.

Prerequisites

This blog will concertrate on the the Databricks part and not the plumbing before hand as it’s already well documented.

But to help, you will need to perform the following actions:

Create an AAD App registrationYou can follow this Microsoft doc
Create a secret for your app registrationThis blogs covers the next 3 steps very well
Create an AAD GroupSee above
Add the App registration to the AAD group as a member
See above
Add the AAD group to Azure SQL Active Directory Admin You can follow this guide as well as the above
We’re connecting the service principal as a server admin but it could be linked to a user with specific rights in the database using the “CREATE USER FROM EXTERNAL PROVIDER;” statement.

Gathering some keys

In order to do the next part you will need a few keys:

Tenant IDYou can find this in your Azure Active Directory service in the Azure portal in the properties blade and directory id field.
App registration Application (client) IDYou can find this on the overview blade of the app regristration created earlier.

Secret keyThis was given to you when you initially created the secret. If you did not write it down, you can delete the key and create a new one.

Connecting to Azure SQL Database

Now that all the plumbing is done we’re ready to connect Azure Databricks to Azure SQL Database. In this section we’ll be using the keys we gathered to generate an access token which will be used to connect to Azure SQL Database.

Let’s look at the building blocks first:

Adding the required libraries

You will need to add the following libraries to your cluster:

com.microsoft.azure:adal4j:1.6.4 found here

com.microsoft.azure:azure-sqldb-spark:1.0.2 found here

Instantiate the ADAL AuthenticationContext object

//Instantiate the ADAL AuthenticationContext object
 val service = Executors.newFixedThreadPool(1)
 val context = new AuthenticationContext(authority, true, service);
Constructor and Description
AuthenticationContext(String authority, boolean validateAuthority, ExecutorService service) Constructor to create the context with the address of the authority.

Get the access token

//Get access token
 val ClientCred = new ClientCredential(ServicePrincipalId, ServicePrincipalPwd)
 val authResult = context.acquireToken(resourceAppIdURI, ClientCred, null)
 val accessToken = authResult.get().getAccessToken
Constructor and Description
ClientCredential(String clientId, String clientSecret) Constructor to create credential with client id and secret
acquireToken

public Future<AuthenticationResult> acquireToken(
String resource,
AsymmetricKeyCredential credential,
AuthenticationCallback callback)
throws AuthenticationException

Acquires security token from the authority.

Parameters:
resource – Identifier of the target resource that is the recipient of the requested token.
credential – object representing Private Key to use for token acquisition.
callback- optional callback object for non-blocking execution.

Returns:
Future object representing the AuthenticationResult of the call. It contains Access Token and the Access Token’s expiration time. Refresh Token property will be null for this overload.

Throws:
AuthenticationException – AuthenticationException

Putting it together

import com.microsoft.aad.adal4j.ClientCredential
import com.microsoft.aad.adal4j.AuthenticationContext
import java.util.concurrent.Executors

val TenantId = "<FILL HERE>"
val authority = "https://login.windows.net/" + TenantId
val resourceAppIdURI = "https://database.windows.net/"
val ServicePrincipalId = "<FILL HERE>"
val ServicePrincipalPwd = "<FILL HERE>"

//Instantiate the ADAL AuthenticationContext object
val service = Executors.newFixedThreadPool(1)
val context = new AuthenticationContext(authority, true, service);

//Get access token
val ClientCred = new ClientCredential(ServicePrincipalId, ServicePrincipalPwd)
val authResult = context.acquireToken(resourceAppIdURI, ClientCred, null)
val accessToken = authResult.get().getAccessToken

Querying Azure SQL Database

Once the token generated, you simply need to call the spark.read.sqlDB command and pass the token to authenticate.

import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._

val config = Config(Map(
   "url"            -> "<SERVER NAME>.database.windows.net",
   "databaseName"   -> "<Database Name>",
   "dbTable"        -> "<Table Name>",
   "accessToken"           -> accessToken,
   "hostNameInCertificate" -> "*.database.windows.net",
   "encrypt"               -> "true"
))

val collection = spark.read.sqlDB(config)

collection.show()

As you can see, once the token generated, it’s really easy to query Azure SQL Database.

Hope this helped!

29 thoughts on “Connect Azure Databricks to SQL Database & Azure SQL Data Warehouse using a Service Principal

  1. I installed com.microsoft.azure:adal4j:1.6.4, com.microsoft.azure:azure-sqldb-spark:1.0.2 both library in cluster but still getting error ” ClientCredential an AuthenticationCotext not found.
    import com.microsoft.aad.adal4j.ClientCredential
    import com.microsoft.aad.adal4j.AuthenticationContext

    can you please help me here to connect SQL DB using AAD.

      1. Thanks Benjamin.

        It resolved one error but I am getting another one

        java.lang.ClassCastException: java.util.Collections$SingletonList cannot be cast to java.lang.String
        at com.nimbusds.oauth2.sdk.util.URLUtils.serializeParameters(URLUtils.java:88)
        at com.microsoft.aad.adal4j.AdalTokenRequest.toOAuthRequest(AdalTokenRequest.java:160)
        at com.microsoft.aad.adal4j.AdalTokenRequest.executeOAuthRequestAndProcessResponse(AdalTokenRequest.java:86)
        at com.microsoft.aad.adal4j.AuthenticationContext.acquireTokenCommon(AuthenticationContext.java:930)
        at com.microsoft.aad.adal4j.AcquireTokenCallable.execute(AcquireTokenCallable.java:70)
        at com.microsoft.aad.adal4j.AcquireTokenCallable.execute(AcquireTokenCallable.java:38)
        at com.microsoft.aad.adal4j.AdalCallable.call(AdalCallable.java:47)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        1. Difficult to say without seeing the whole code and trying out for myself. But there seems to be a newer way of getting access tokens from AAD: MSAL4J I would check it out.

  2. Hi

    I am following your all Databricks article. I really like the way you explained.

    I am trying to connect Azure SQl DB using Service principle. I installed all libraries which you mentioned above. I detached and attached cluster after installing library

    But getting below error:

    notebook:2: error: object AuthenticationContext is not a member of package com.microsoft.aad.adal4j
    import com.microsoft.aad.adal4j.AuthenticationContext
    ^
    notebook:13: error: not found: type AuthenticationContext
    val context = new AuthenticationContext(authority, true, service);
    ^
    notebook:1: error: object ClientCredential is not a member of package com.microsoft.aad.adal4j
    import com.microsoft.aad.adal4j.ClientCredential
    ^

    Can you please help here…

    1. Hi, seems you might have library problems. Make sure you install both libraries to your cluster from the Maven repo:

      – com.microsoft.azure:adal4j:1.6.4
      – com.microsoft.azure:azure-sqldb-spark:1.0.2

  3. Hello Benjamin,

    Thank you for the write-up.
    I was wondering if you can explain how can we leverage ADAL token to connect to ADLS in Databricks. I tried the concept but I kept on getting authorization error.

    Thanks,
    hary232

  4. if “Refresh Token property will be null for this overload”, how do we Refresh the Token once it expires?
    Acquiring the Token works. I see that we get the expiration date. What is the best practice to refresh it? Schedule a job that refreshes the token before it expires?

    Thanks!

    1. I would check before using if expire and renew it then. Even better idea would be to wrap the usage of the token in a custom package / class in Azure Databricks which handles renewing the token.

  5. I also faced similar issue for one of my environment and it’s working when I am uninstalling and installing the package again.
    object aad is not a member of package com.microsoft
    import com.microsoft.aad.adal4j.ClientCredential

    1. I sometimes find, detaching / re-attaching the cluster works. Or change the runtime of your databricks cluster as you may have version conflicts. Look at the cluster logs for any errors

  6. Hello Benjamin,

    Thank you for this wonderful article. I am able to connect to the sql database and query a table. My question here is i want to be able to insert data into multiple tables. Is there a way i can connect to the database once and then execute the insert statements one after the other.

    Thanks
    Sujatha

  7. Thank you Benjamin Leroux. Very helpful.

    I tried to insert a record into SQL DB table using the same way but couldn’t. May I please know how can we use the config to execute an insert query ? It would be very grateful if you can share this information.

    Thanks
    Vin

  8. I’m getting timeout for a bulkcopy against a Azure SQL Database with service principal connection after 1 hour. I assume it is because of the token expired so the copy cannot finish.

    Is this an accurate guess and what can be done about it?

    1. Pretty sure yes. How much data are you loading? And what size is your Azure SQL DB? Can you confirm some data was loaded?

  9. Hello Benhamin,
    I´m following your procedure but when I try to query Azure SQL database I´m facing the following message:

    com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. ClientConnectionId:dec6e2cf-0c07-45cc-84e6-f633e231b585

    I´m also ensuring AAD group as Azure SQL Active Directory Admin but the issue still happen. Do you know something to help ?

    1. Looks like you’re using the wrong authentication method. Can you show me your code? Or email it to me?

  10. Hi, when I run the line val accessToken = authResult.get().getAccessToken, I get the below exception. Has anyone encountered this before?

    java.lang.ClassCastException: java.util.Collections$SingletonList cannot be cast to java.lang.String
    at com.nimbusds.oauth2.sdk.util.URLUtils.serializeParameters(URLUtils.java:105)
    at com.microsoft.aad.adal4j.AdalTokenRequest.toOAuthRequest(AdalTokenRequest.java:160)
    at com.microsoft.aad.adal4j.AdalTokenRequest.executeOAuthRequestAndProcessResponse(AdalTokenRequest.java:86)
    at com.microsoft.aad.adal4j.AuthenticationContext.acquireTokenCommon(AuthenticationContext.java:930)
    at com.microsoft.aad.adal4j.AcquireTokenCallable.execute(AcquireTokenCallable.java:70)
    at com.microsoft.aad.adal4j.AcquireTokenCallable.execute(AcquireTokenCallable.java:38)
    at com.microsoft.aad.adal4j.AdalCallable.call(AdalCallable.java:47)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    1. I’m having the same issue with this – haven’t been able to find anything on how to resolve it, or why it’s happening.

Leave a Reply

Your email address will not be published. Required fields are marked *