Databricks Snowflake Summit
Databricks
Recap keynotes: e.g. Responsible AI, Lakehouse role in democratizing AI (more support for unstructured data for AI era in addition to structured data), Scaling AI on capacity and cost efficiency, LakehouseIQ to query Data In English Via An LLM with Unity Catalog, MosaicML machine learning platform (Databricks’ GenAI strategy), LakehouseAI (Delta Live Tables, Databricks Workflows, Databricks SQL, automatic data layout optimization) with Vector Search, Feature Serving, MLflow AI Gateway, Delta Sharing and Lakehouse Apps
Delta UniForm supports unification of table formats like Delta Lake, Apache Iceberg, and Apache Hudi, video
Databricks Vector Search automatically creates vector embeddings from files in Unity Catalog, Databricks AutoML securely fine-tune LLMs using enterprise data, open source models available within Databricks Marketplace — including MPT-7B and Falcon-7B, Databricks Model Serving, MLflow AI Gateway (centrally manage API key to various LLMs, rate limiting), MLflow Prompt Tools, Databricks Lakehouse Monitoring
Alation is active metadata platform can power Databricks Unity catalog
Lakehouse Federation: unified view of data estate (e.g. MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse, Google’s BigQuery) in one place, model serving
LakehouseIQ is large language model trained on your own data (e.g. understand specific regions in your data to generate more accurate SQL, e.g. instead of salesterritory = ‘Europe’, generate salesterritory in (‘EMEA Northern’, ‘EMEA Southern’); integrate with langchain), Databricks Lakehouse monitoring, model evaluation, LLM evaluation
Liquid clustering to improve data layout, LLMOps
Snowflake
Secure Data Sharing: Snowflake is developing industry specific “data clouds” to increase usage of this sharing feature by combining industry relevant data with participation of key players within the vertical supply chain. Databricks is also trying to build critical mass for their sharing service, announcing the week prior to the conference that Delta Sharing will be open and available across services external to Databricks.
Snowflake: Expanded Iceberg table support (Unified Iceberg Tables of external and native), Snowflake native apps (allows developers to package code or data using stored procedures, tasks, streams, UDFs, Snowpark, etc. on Streamlit (low code UX framework)) in their Marketplace, Snowpark Container Services, Snowflake data share, Document AI, Improved Data Clean Room with Differential Privacy in SQL and Python, core engine cost optimization strategies, Dynamic Tables to simply data ingestion pipeline with declarative data transformation pipelines.
Amazon S3-compatible Storage, Snowpark ML supports scikit-learn, xgboost, and lightgbm, ML-Powered Functions
Snowpark Container Services with image registry (OCIv2 compliant service), data apps (integration with streamlit), Snowpark ML with airflow orchestration
Appendix
Databricks SQL talks
Databricks Unity Catalog
Governance for AI: Unity catalog for AI (mlflow model, feature tables); AI for governance: lakehouse monitoring (PII detection, billing, audit, lineage, data quality)
Databricks Connect enables you to connect popular IDEs such as PyCharm, notebook servers, and other custom applications to Databricks clusters; built-in PySpark test framework