Databricks Snowflake Summit

Latest update from two major data platform providers

Xin Cheng
3 min readOct 16, 2023

Databricks

Recap keynotes: e.g. Responsible AI, Lakehouse role in democratizing AI (more support for unstructured data for AI era in addition to structured data), Scaling AI on capacity and cost efficiency, LakehouseIQ to query Data In English Via An LLM with Unity Catalog, MosaicML machine learning platform (Databricks’ GenAI strategy), LakehouseAI (Delta Live Tables, Databricks Workflows, Databricks SQL, automatic data layout optimization) with Vector Search, Feature Serving, MLflow AI Gateway, Delta Sharing and Lakehouse Apps

Delta UniForm supports unification of table formats like Delta Lake, Apache Iceberg, and Apache Hudi, video

Databricks Vector Search automatically creates vector embeddings from files in Unity Catalog, Databricks AutoML securely fine-tune LLMs using enterprise data, open source models available within Databricks Marketplace — including MPT-7B and Falcon-7B, Databricks Model Serving, MLflow AI Gateway (centrally manage API key to various LLMs, rate limiting), MLflow Prompt Tools, Databricks Lakehouse Monitoring

Alation is active metadata platform can power Databricks Unity catalog

Lakehouse Federation: unified view of data estate (e.g. MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse, Google’s BigQuery) in one place, model serving

LakehouseIQ is large language model trained on your own data (e.g. understand specific regions in your data to generate more accurate SQL, e.g. instead of salesterritory = ‘Europe’, generate salesterritory in (‘EMEA Northern’, ‘EMEA Southern’); integrate with langchain), Databricks Lakehouse monitoring, model evaluation, LLM evaluation

Liquid clustering to improve data layout, LLMOps

Snowflake

Secure Data Sharing: Snowflake is developing industry specific “data clouds” to increase usage of this sharing feature by combining industry relevant data with participation of key players within the vertical supply chain. Databricks is also trying to build critical mass for their sharing service, announcing the week prior to the conference that Delta Sharing will be open and available across services external to Databricks.

Snowflake: Expanded Iceberg table support (Unified Iceberg Tables of external and native), Snowflake native apps (allows developers to package code or data using stored procedures, tasks, streams, UDFs, Snowpark, etc. on Streamlit (low code UX framework)) in their Marketplace, Snowpark Container Services, Snowflake data share, Document AI, Improved Data Clean Room with Differential Privacy in SQL and Python, core engine cost optimization strategies, Dynamic Tables to simply data ingestion pipeline with declarative data transformation pipelines.

Amazon S3-compatible Storage, Snowpark ML supports scikit-learn, xgboost, and lightgbm, ML-Powered Functions

Snowpark Container Services with image registry (OCIv2 compliant service), data apps (integration with streamlit), Snowpark ML with airflow orchestration

Appendix

Databricks SQL talks

Databricks Unity Catalog

Governance for AI: Unity catalog for AI (mlflow model, feature tables); AI for governance: lakehouse monitoring (PII detection, billing, audit, lineage, data quality)

Databricks Connect enables you to connect popular IDEs such as PyCharm, notebook servers, and other custom applications to Databricks clusters; built-in PySpark test framework

--

--

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified

No responses yet