Databricks vs Snowflake

Published on: 06 January 2026

The Storage Mechanics: "Micro-Partitions" vs. "Delta Log"

flowchart TB
    %% ==========================================
    %% Left Side: Databricks (File-Based Logic)
    %% ==========================================
    subgraph DB_System ["Databricks: Distributed Intelligence"]
        direction TB
        style DB_System fill:#fff3e0,stroke:#e65100

        DB_Driver[("Spark Driver
(The 'Brain' in the Cluster)")]
        style DB_Driver fill:#ffcc80,stroke:#e65100

        subgraph S3 ["User Cloud Storage (S3 / ADLS)"]
            style S3 fill:#fff,stroke:#e65100,stroke-dasharray: 5 5

            subgraph Delta_Log ["_delta_log Folder"]
                style Delta_Log fill:#ffe0b2,stroke:#ef6c00
                JSON["00001.json
(Transaction Log)"]
            end

            subgraph Data_Files ["Parquet Data"]
                P1["File_A.parquet
(Data: 2023)"]
                P2["File_B.parquet
(Data: 2024)"]
            end
        end

        %% The Flow
        DB_Driver --"1.Reads Log First"--> JSON
        JSON --"2.Returns List:
File_B is valid
File_A is skipped"--> DB_Driver
        DB_Driver --"3.Reads only required file"--> P2

        %% Visual styling to show skipped file
        linkStyle 2 stroke:#00C853,stroke-width:3px;
        style P1 opacity:0.5
    end

    %% ==========================================
    %% Right Side: Snowflake (Service-Based Logic)
    %% ==========================================
    subgraph SF_System ["Snowflake: Centralized Intelligence"]
        direction TB
        style SF_System fill:#e1f5fe,stroke:#01579b

        subgraph Cloud_Services ["Cloud Services Layer"]
            style Cloud_Services fill:#b3e5fc,stroke:#0277bd
            Meta_Store[("Global Metadata Store
(FoundationDB)")]
        end

        SF_Compute[("Virtual Warehouse
(Compute Nodes)")]
        style SF_Compute fill:#81d4fa,stroke:#0277bd

        subgraph SF_Storage ["Proprietary Storage Layer"]
            style SF_Storage fill:#fff,stroke:#0277bd,stroke-dasharray: 5 5
            MP1["Micro-Partition 1
(Data: 2023)"]
            MP2["Micro-Partition 2
(Data: 2024)"]
        end

        %% The Flow
        SF_Compute --"1.Compiles SQL &
Queries Metadata"--> Meta_Store
        Meta_Store --"2.Says: Only MP2
contains logical match"--> SF_Compute
        SF_Compute --"3.Fetches Specific Block"--> MP2

        %% Visual styling to show skipped partition
        linkStyle 5 stroke:#00C853,stroke-width:3px;
        style MP1 opacity:0.5
    end

The Compute Model: "T-Shirt Sizing" vs. "Executor Control"

classDiagram
    note "Compute Model Comparison"
    direction LR

    class Snowflake_VW {
        <>
        SIZE : XS, S, M ... 4XL
        SCALING : Auto-Scale [Multi-Cluster]
        CACHE : Local SSD [Warmed on Query]
        %% Actions / Constraints
        -User_CANNOT_Pick_Instance()
        -User_CANNOT_Tune_JVM()
    }

    class Databricks_Cluster {
        <>
        NODES : Driver + Workers
        COST : Spot Instances Supported
        ENGINE : Photon C++ / Spark JVM
        %% Actions / Capabilities
        +User_Selects_Instance_Type()
        +User_Tunes_Memory_Cores()
    }

    class Workload_Fit {
        <>
    }

    Snowflake_VW --> Workload_Fit : "Best for
Concurrent BI/SQL"
    Databricks_Cluster --> Workload_Fit : "Best for
Heavy ETL/ML"

The Security & Governance Topology: "The Castle" vs. "The Passport"

graph TD
    %% ==========================================
    %% Left Side: Snowflake (The Castle)
    %% ==========================================
    subgraph SF_World ["Snowflake: Walled Garden"]
        direction TB
        style SF_World fill:#e1f5fe,stroke:#01579b

        SF_User[User]
        SF_Gateway["Access Control Layer
(RBAC / Governance)"]
        style SF_Gateway fill:#b3e5fc,stroke:#01579b

        subgraph SF_Internals ["Snowflake Managed Storage"]
            style SF_Internals fill:#fff,stroke:#01579b,stroke-dasharray: 5 5
            Internal_Store[("Micro-Partitions")]
        end

        %% Straight Flow = Control
        SF_User -->|"1.Submit SQL"| SF_Gateway
        SF_Gateway -->|"2.Verify & Read (Proxy)"| Internal_Store
    end

    %% ==========================================
    %% Right Side: Databricks (The Passport)
    %% ==========================================
    subgraph DB_World ["Databricks: Passport Model"]
        direction TB
        style DB_World fill:#fff3e0,stroke:#e65100

        DB_User[User] 
        DB_Cluster[("Compute Cluster
(Spark/Photon)")]
        UC[("Unity Catalog
(Policy Engine)")]

        subgraph Cloud_Storage ["Your Cloud Storage"]
            style Cloud_Storage fill:#fff,stroke:#e65100,stroke-dasharray: 5 5
            S3_Gold[("S3 Gold Bucket")]
        end

        %% The Triangle Flow
        DB_User -->|"1.Submit Code"| DB_Cluster
        DB_Cluster -->|"2.Request Access"| UC
        UC -.->|"3.Vends Temp Token"| DB_Cluster
        DB_Cluster -->|"4.Direct Access w/ Token"| S3_Gold
    end

    %% ==========================================
    %% Insights / Comparison
    %% ==========================================
    NoteSF[/"Security is a GATEWAY.
No one touches data without passing
through the Snowflake Engine."/]
    NoteDB[/"Security is a PERMISSION SLIP.
Engine gets a token, then
reads data directly from storage."/]

    style NoteSF fill:#81d4fa,stroke:none
    style NoteDB fill:#ffcc80,stroke:none

    SF_World --- NoteSF
    DB_World --- NoteDB

Summary

1. Performance Source

Snowflake (Intelligent Storage): Relying on Global Metadata Pruning. Since it acts as a "Walled Garden," it enforces a perfect micro-partition layout during ingestion. This allows queries to surgically skip 99% of data without reading it.
Databricks (Intelligent Compute): Relying on Photon (Vectorized C++) + Delta Skipping. While it respects the open format, it uses the Delta Log for file skipping and a raw, vectorized C++ engine to process the remaining data faster than Java-based engines ever could.

2. Concurrency & Workload

Snowflake (Best for BI): Wins on Burst Concurrency. Its compute nodes are "Stateless" (local cache only). This allows it to spin up 10 clusters instantly to serve 1,000 dashboard users, then shut down immediately.
Databricks (Best for AI/ETL): Wins on Sustained Complexity. Spark's "Stateful" distributed memory model excels at massive shuffles (e.g., joining two 10TB tables or training an iterative ML model), which allows it to solve problems that would choke a standard warehouse.

3. The Convergence

Databricks is hiding the engine: With "Databricks SQL Serverless," they are abstracting the clusters away to capture the Business Analyst market.
Snowflake is opening the garden: With "Snowpark" and "Iceberg Tables," they are allowing Python code and external storage to capture the Data Science market.

Share this post

Share on X • Share on LinkedIn • Share via Email