Databricks is a very popular data platform. Hashicorp Terraform is a popular cloud infrastructure provision tool. I like to try out new things in quick and easy way. Instead of manual provisioning which is tedious and error-prone, it is better to have 1-click that provisions all necessary resources. That’s what integration between Databricks and Terraform are promising.
Currently we need to take 2 steps to provision Azure Databricks:
- Provision Azure Databricks
- Provision Databricks resources (e.g. cluster, job, notebook, etc.)
Create Azure Databricks workspace
Use Azure cloud shell, which already has terraform installed. Use the following terraform template:
az login
terraform init
terraform plan
terraform apply
Output
databricks_host = “https://<workspace url>.azuredatabricks.net/"
Add role assignment to storage account for service principal.
Login in azure data bricks, generate PAT (personal access token) according to following article:
Go back to Azure cloud shell, configure Databricks authentication
pip install -U databricks-cli
databricks --version
databricks configure --token
Input Databricks host and PAT. The authentication information is stored in ~/.databrickscfg
Create Databricks resource
Again, Databricks terraform-provider-databricks repo has good quick-start template.
Run test notebook
change the tenant “microsoft.onmicrosoft.com” to your tenant and related client id, secret
change “abfss://container01@demostore01.dfs.core.windows.net/” to /mnt/testblob/