Using Azure Synapse Notebooks to Connect to Relational Databases

Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. Apache Spark in Azure Synapse Analytics is one of Microsoft’s implementations of Apache Spark in the cloud. Azure Synapse makes it easy to create and configure a serverless Apache Spark pool in Azure. Spark pools in Azure Synapse are compatible with Azure Storage, Azure SQL Database and Azure Data Lake Generation 2 Storage. So you can use Spark pools to process your data stored in Azure.

In this article, I will show you how to connect to an relational database like Azure SQL database and query data within a Synapse notebook. The authentication method in this exercise will be using service principal.

Prerequisite:

  1. Azure Synapse workspace (MS guide is here)
  2. Azure Synapse Apache Spark pool (MS guide is here)
  3. Service principal in Azure AD (MS guide is here)

The very first step in this exercise is to add the service principal to the Azure SQL DB as an external user. For this login to the Azure SQL DB using Azure Data Studio or SSMS using an Azure Activity Directory account with owner rights on the database. Execute the following T SQL code on the source Azure SQL DB

CREATE USER [my-synapse-serv-prin] FROM EXTERNAL PROVIDER
GO
EXEC sp_addrolemember 'db_reader', [my-synapse-serv-prin]
GO

Please note the external user name is same as the service principal name you have in Azure AD.

Next we need to create a linked service with the source Azure SQL DB. In the Synapse workspace, click on the manage tab and create a new linked service. Below is an example of the connection details

Now we have setup everything needed to create a Synapse notebook to connect, query and work with data from relational databases.

From the Develop tab in Synapse workspace, create a new notebook, attach to the spark pool you created in the earlier prerequisite steps. Choose Python as the programming language.

In the first cell we need to create a connection to the Azure SQL DB using the linked service we created earlier.

%%pyspark
access_token = TokenLibrary.getConnectionString("AzureSqlDatabase_service_principal")
#print(access_token)
url = "jdbc:sqlserver://azuresqlserver.database.windows.net:1433"
customerDF = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url",url) \
.option("dbtable", "dbo.tb_Customer") \
.option("databasename", "My_Azure_SQLDB_Name") \
.option("accessToken", access_token) \
.option("encrypt", "true").option("hostNameInCertificate", "*.database.windows.net").load()
display(customerDF)

Summary

In this article we saw how to connect to relational data sources within Azure Synapse notebooks. Notebooks are also widely used in data preparation, data visualization, machine learning, and other big data scenarios. A Synapse Studio notebook is a web interface for you to create files that contain live code, visualizations, and narrative text. Notebooks are a good place to validate ideas and use quick experiments to get insights from your data.

With an Synapse Studio notebook, you can:

  • Get started with zero setup effort.
  • Keep data secure with built-in enterprise security features.
  • Analyze data across raw formats (CSV, txt, JSON, etc.), processed file formats (parquet, Delta Lake, ORC, etc.), and SQL tabular data files against Spark and SQL.
  • Be productive with enhanced authoring capabilities and built-in data visualization.

Published by Narayan Sujay Somasekhar

• 12+ years of experience leading the build of BI and Cloud Data Platform solutions using cloud technologies such as Snowflake, Azure Synapse, Databricks and AWS Redshift. • Over 8+ years as a Data Analytics and Engineering practice leader with demonstrated history of working with management consulting firms across Tax & Accounting, Finance, Power & Utility industry. • Experience in managing the team roadmap, and delivering actionable data insights to sales, product, marketing, and senior leadership. • Strong background in Data Technology Solutions delivery & Data Automation for business processes using various tools. • Expertise in bringing Data-Driven IT Strategic Planning to align metrics, communicate data changes across reporting, Enterprise Data Warehouses, Data Lakes and Customer Relationship Managements Systems. • Experienced working with cross functional teams, Data Scientists/Analysts and Business Managers in building Data Science and Data Engineering practice from the ground up. • Experienced in Designing and implementing NLP solutions with focus on sentiment analysis, opinion mining, key phase extraction using Azure Cognitive Services and Amazon Comprehend • Extensive programming experience with SQL, Python, C#, R, and Scala.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: