Hello world to Azure Databricks with Terraform

Databricks is a very popular data platform. Hashicorp Terraform is a popular cloud infrastructure provision tool. I like to try out new things in quick and easy way. Instead of manual provisioning which is tedious and error-prone, it is better to have 1-click that provisions all necessary resources. That’s what integration between Databricks and Terraform are promising.

https://databricks.com/blog/2020/09/11/announcing-databricks-labs-terraform-integration-on-aws-and-azure.html

Currently we need to take 2 steps to provision Azure Databricks:

  1. Provision Azure Databricks
  2. Provision Databricks resources (e.g. cluster, job, notebook, etc.)

Create Azure Databricks workspace

Use Azure cloud shell, which already has terraform installed. Use the following terraform template:

az login
terraform init
terraform plan
terraform apply

Output

databricks_host = “https://<workspace url>.azuredatabricks.net/"

Add role assignment to storage account for service principal.

Login in azure data bricks, generate PAT (personal access token) according to following article:

Go back to Azure cloud shell, configure Databricks authentication

pip install -U databricks-cli
databricks --version
databricks configure --token

Input Databricks host and PAT. The authentication information is stored in ~/.databrickscfg

Create Databricks resource

Again, Databricks terraform-provider-databricks repo has good quick-start template.

Run test notebook

change the tenant “microsoft.onmicrosoft.com” to your tenant and related client id, secret

change “abfss://container01@demostore01.dfs.core.windows.net/” to /mnt/testblob/

Databricks and MLFlow

Multi-cloud, Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect