Databricks vs Snowflake

Published on: 06 January 2026

Tags: #databricks #snowflake #storage


The Storage Mechanics: "Micro-Partitions" vs. "Delta Log"

flowchart TB
    %% ==========================================
    %% Left Side: Databricks (File-Based Logic)
    %% ==========================================
    subgraph DB_System ["Databricks: Distributed Intelligence"]
        direction TB
        style DB_System fill:#fff3e0,stroke:#e65100

        DB_Driver[("Spark Driver
(The 'Brain' in the Cluster)")] style DB_Driver fill:#ffcc80,stroke:#e65100 subgraph S3 ["User Cloud Storage (S3 / ADLS)"] style S3 fill:#fff,stroke:#e65100,stroke-dasharray: 5 5 subgraph Delta_Log ["_delta_log Folder"] style Delta_Log fill:#ffe0b2,stroke:#ef6c00 JSON["00001.json
(Transaction Log)"] end subgraph Data_Files ["Parquet Data"] P1["File_A.parquet
(Data: 2023)"] P2["File_B.parquet
(Data: 2024)"] end end %% The Flow DB_Driver --"1.Reads Log First"--> JSON JSON --"2.Returns List:
File_B is valid
File_A is skipped"--> DB_Driver DB_Driver --"3.Reads only required file"--> P2 %% Visual styling to show skipped file linkStyle 2 stroke:#00C853,stroke-width:3px; style P1 opacity:0.5 end %% ========================================== %% Right Side: Snowflake (Service-Based Logic) %% ========================================== subgraph SF_System ["Snowflake: Centralized Intelligence"] direction TB style SF_System fill:#e1f5fe,stroke:#01579b subgraph Cloud_Services ["Cloud Services Layer"] style Cloud_Services fill:#b3e5fc,stroke:#0277bd Meta_Store[("Global Metadata Store
(FoundationDB)")] end SF_Compute[("Virtual Warehouse
(Compute Nodes)")] style SF_Compute fill:#81d4fa,stroke:#0277bd subgraph SF_Storage ["Proprietary Storage Layer"] style SF_Storage fill:#fff,stroke:#0277bd,stroke-dasharray: 5 5 MP1["Micro-Partition 1
(Data: 2023)"] MP2["Micro-Partition 2
(Data: 2024)"] end %% The Flow SF_Compute --"1.Compiles SQL &
Queries Metadata"--> Meta_Store Meta_Store --"2.Says: Only MP2
contains logical match"--> SF_Compute SF_Compute --"3.Fetches Specific Block"--> MP2 %% Visual styling to show skipped partition linkStyle 5 stroke:#00C853,stroke-width:3px; style MP1 opacity:0.5 end

The Compute Model: "T-Shirt Sizing" vs. "Executor Control"

classDiagram
    note "Compute Model Comparison"
    direction LR

    class Snowflake_VW {
        <>
        SIZE : XS, S, M ... 4XL
        SCALING : Auto-Scale [Multi-Cluster]
        CACHE : Local SSD [Warmed on Query]
        %% Actions / Constraints
        -User_CANNOT_Pick_Instance()
        -User_CANNOT_Tune_JVM()
    }

    class Databricks_Cluster {
        <>
        NODES : Driver + Workers
        COST : Spot Instances Supported
        ENGINE : Photon C++ / Spark JVM
        %% Actions / Capabilities
        +User_Selects_Instance_Type()
        +User_Tunes_Memory_Cores()
    }

    class Workload_Fit {
        <>
    }

    Snowflake_VW --> Workload_Fit : "Best for
Concurrent BI/SQL" Databricks_Cluster --> Workload_Fit : "Best for
Heavy ETL/ML"

The Security & Governance Topology: "The Castle" vs. "The Passport"

graph TD
    %% ==========================================
    %% Left Side: Snowflake (The Castle)
    %% ==========================================
    subgraph SF_World ["Snowflake: Walled Garden"]
        direction TB
        style SF_World fill:#e1f5fe,stroke:#01579b

        SF_User[User]
        SF_Gateway["Access Control Layer
(RBAC / Governance)"] style SF_Gateway fill:#b3e5fc,stroke:#01579b subgraph SF_Internals ["Snowflake Managed Storage"] style SF_Internals fill:#fff,stroke:#01579b,stroke-dasharray: 5 5 Internal_Store[("Micro-Partitions")] end %% Straight Flow = Control SF_User -->|"1.Submit SQL"| SF_Gateway SF_Gateway -->|"2.Verify & Read (Proxy)"| Internal_Store end %% ========================================== %% Right Side: Databricks (The Passport) %% ========================================== subgraph DB_World ["Databricks: Passport Model"] direction TB style DB_World fill:#fff3e0,stroke:#e65100 DB_User[User] DB_Cluster[("Compute Cluster
(Spark/Photon)")] UC[("Unity Catalog
(Policy Engine)")] subgraph Cloud_Storage ["Your Cloud Storage"] style Cloud_Storage fill:#fff,stroke:#e65100,stroke-dasharray: 5 5 S3_Gold[("S3 Gold Bucket")] end %% The Triangle Flow DB_User -->|"1.Submit Code"| DB_Cluster DB_Cluster -->|"2.Request Access"| UC UC -.->|"3.Vends Temp Token"| DB_Cluster DB_Cluster -->|"4.Direct Access w/ Token"| S3_Gold end %% ========================================== %% Insights / Comparison %% ========================================== NoteSF[/"Security is a GATEWAY.
No one touches data without passing
through the Snowflake Engine."/] NoteDB[/"Security is a PERMISSION SLIP.
Engine gets a token, then
reads data directly from storage."/] style NoteSF fill:#81d4fa,stroke:none style NoteDB fill:#ffcc80,stroke:none SF_World --- NoteSF DB_World --- NoteDB

Summary

1. Performance Source

  • Snowflake (Intelligent Storage): Relying on Global Metadata Pruning. Since it acts as a "Walled Garden," it enforces a perfect micro-partition layout during ingestion. This allows queries to surgically skip 99% of data without reading it.
  • Databricks (Intelligent Compute): Relying on Photon (Vectorized C++) + Delta Skipping. While it respects the open format, it uses the Delta Log for file skipping and a raw, vectorized C++ engine to process the remaining data faster than Java-based engines ever could.

2. Concurrency & Workload

  • Snowflake (Best for BI): Wins on Burst Concurrency. Its compute nodes are "Stateless" (local cache only). This allows it to spin up 10 clusters instantly to serve 1,000 dashboard users, then shut down immediately.
  • Databricks (Best for AI/ETL): Wins on Sustained Complexity. Spark's "Stateful" distributed memory model excels at massive shuffles (e.g., joining two 10TB tables or training an iterative ML model), which allows it to solve problems that would choke a standard warehouse.

3. The Convergence

  • Databricks is hiding the engine: With "Databricks SQL Serverless," they are abstracting the clusters away to capture the Business Analyst market.
  • Snowflake is opening the garden: With "Snowpark" and "Iceberg Tables," they are allowing Python code and external storage to capture the Data Science market.

Share this post

Share on X  •  Share on LinkedIn  •  Share via Email