Using Spark in Azure Databricks

This will be a multi part walk through and lab showing working with Azure Databricks using Spark and also interacting with Azure Synapse (formerly Azure SQL DW)

Part – 1 Understanding the SQL DW Connector

You can access Azure SQL Data Warehouse (SQL DW) from Azure Databricks using the SQL Data Warehouse connector (referred to as the SQL DW connector), a data source implementation for Apache Spark that uses Azure Blob storage, and PolyBase in SQL DW to transfer large volumes of data efficiently between a Databricks cluster and a SQL DW instance.

Both the Databricks cluster and the SQL DW instance access a common Blob storage container to exchange data between these two systems. In Databricks, Spark jobs are triggered by the SQL DW connector to read data from and write data to the Blob storage container. On the SQL DW side, data loading and unloading operations performed by PolyBase are triggered by the SQL DW connector through JDBC.

The SQL DW connector is more suited to ETL than to interactive queries, because each query execution can extract large amounts of data to Blob storage. If you plan to perform several queries against the same SQL DW table, we recommend that you save the extracted data in a format such as Parquet.

%d bloggers like this: