Databricks vs Snowflake
Published on: 06 January 2026
Tags: #databricks #snowflake #storage
The Storage Mechanics: "Micro-Partitions" vs. "Delta Log"
flowchart TB
%% ==========================================
%% Left Side: Databricks (File-Based Logic)
%% ==========================================
subgraph DB_System ["Databricks: Distributed Intelligence"]
direction TB
style DB_System fill:#fff3e0,stroke:#e65100
DB_Driver[("Spark Driver
(The 'Brain' in the Cluster)")]
style DB_Driver fill:#ffcc80,stroke:#e65100
subgraph S3 ["User Cloud Storage (S3 / ADLS)"]
style S3 fill:#fff,stroke:#e65100,stroke-dasharray: 5 5
subgraph Delta_Log ["_delta_log Folder"]
style Delta_Log fill:#ffe0b2,stroke:#ef6c00
JSON["00001.json
(Transaction Log)"]
end
subgraph Data_Files ["Parquet Data"]
P1["File_A.parquet
(Data: 2023)"]
P2["File_B.parquet
(Data: 2024)"]
end
end
%% The Flow
DB_Driver --"1.Reads Log First"--> JSON
JSON --"2.Returns List:
File_B is valid
File_A is skipped"--> DB_Driver
DB_Driver --"3.Reads only required file"--> P2
%% Visual styling to show skipped file
linkStyle 2 stroke:#00C853,stroke-width:3px;
style P1 opacity:0.5
end
%% ==========================================
%% Right Side: Snowflake (Service-Based Logic)
%% ==========================================
subgraph SF_System ["Snowflake: Centralized Intelligence"]
direction TB
style SF_System fill:#e1f5fe,stroke:#01579b
subgraph Cloud_Services ["Cloud Services Layer"]
style Cloud_Services fill:#b3e5fc,stroke:#0277bd
Meta_Store[("Global Metadata Store
(FoundationDB)")]
end
SF_Compute[("Virtual Warehouse
(Compute Nodes)")]
style SF_Compute fill:#81d4fa,stroke:#0277bd
subgraph SF_Storage ["Proprietary Storage Layer"]
style SF_Storage fill:#fff,stroke:#0277bd,stroke-dasharray: 5 5
MP1["Micro-Partition 1
(Data: 2023)"]
MP2["Micro-Partition 2
(Data: 2024)"]
end
%% The Flow
SF_Compute --"1.Compiles SQL &
Queries Metadata"--> Meta_Store
Meta_Store --"2.Says: Only MP2
contains logical match"--> SF_Compute
SF_Compute --"3.Fetches Specific Block"--> MP2
%% Visual styling to show skipped partition
linkStyle 5 stroke:#00C853,stroke-width:3px;
style MP1 opacity:0.5
end
The Compute Model: "T-Shirt Sizing" vs. "Executor Control"
classDiagram
note "Compute Model Comparison"
direction LR
class Snowflake_VW {
<>
SIZE : XS, S, M ... 4XL
SCALING : Auto-Scale [Multi-Cluster]
CACHE : Local SSD [Warmed on Query]
%% Actions / Constraints
-User_CANNOT_Pick_Instance()
-User_CANNOT_Tune_JVM()
}
class Databricks_Cluster {
<>
NODES : Driver + Workers
COST : Spot Instances Supported
ENGINE : Photon C++ / Spark JVM
%% Actions / Capabilities
+User_Selects_Instance_Type()
+User_Tunes_Memory_Cores()
}
class Workload_Fit {
<>
}
Snowflake_VW --> Workload_Fit : "Best for
Concurrent BI/SQL"
Databricks_Cluster --> Workload_Fit : "Best for
Heavy ETL/ML"
The Security & Governance Topology: "The Castle" vs. "The Passport"
graph TD
%% ==========================================
%% Left Side: Snowflake (The Castle)
%% ==========================================
subgraph SF_World ["Snowflake: Walled Garden"]
direction TB
style SF_World fill:#e1f5fe,stroke:#01579b
SF_User[User]
SF_Gateway["Access Control Layer
(RBAC / Governance)"]
style SF_Gateway fill:#b3e5fc,stroke:#01579b
subgraph SF_Internals ["Snowflake Managed Storage"]
style SF_Internals fill:#fff,stroke:#01579b,stroke-dasharray: 5 5
Internal_Store[("Micro-Partitions")]
end
%% Straight Flow = Control
SF_User -->|"1.Submit SQL"| SF_Gateway
SF_Gateway -->|"2.Verify & Read (Proxy)"| Internal_Store
end
%% ==========================================
%% Right Side: Databricks (The Passport)
%% ==========================================
subgraph DB_World ["Databricks: Passport Model"]
direction TB
style DB_World fill:#fff3e0,stroke:#e65100
DB_User[User]
DB_Cluster[("Compute Cluster
(Spark/Photon)")]
UC[("Unity Catalog
(Policy Engine)")]
subgraph Cloud_Storage ["Your Cloud Storage"]
style Cloud_Storage fill:#fff,stroke:#e65100,stroke-dasharray: 5 5
S3_Gold[("S3 Gold Bucket")]
end
%% The Triangle Flow
DB_User -->|"1.Submit Code"| DB_Cluster
DB_Cluster -->|"2.Request Access"| UC
UC -.->|"3.Vends Temp Token"| DB_Cluster
DB_Cluster -->|"4.Direct Access w/ Token"| S3_Gold
end
%% ==========================================
%% Insights / Comparison
%% ==========================================
NoteSF[/"Security is a GATEWAY.
No one touches data without passing
through the Snowflake Engine."/]
NoteDB[/"Security is a PERMISSION SLIP.
Engine gets a token, then
reads data directly from storage."/]
style NoteSF fill:#81d4fa,stroke:none
style NoteDB fill:#ffcc80,stroke:none
SF_World --- NoteSF
DB_World --- NoteDB
Summary
1. Performance Source
- Snowflake (Intelligent Storage): Relying on Global Metadata Pruning. Since it acts as a "Walled Garden," it enforces a perfect micro-partition layout during ingestion. This allows queries to surgically skip 99% of data without reading it.
- Databricks (Intelligent Compute): Relying on Photon (Vectorized C++) + Delta Skipping. While it respects the open format, it uses the Delta Log for file skipping and a raw, vectorized C++ engine to process the remaining data faster than Java-based engines ever could.
2. Concurrency & Workload
- Snowflake (Best for BI): Wins on Burst Concurrency. Its compute nodes are "Stateless" (local cache only). This allows it to spin up 10 clusters instantly to serve 1,000 dashboard users, then shut down immediately.
- Databricks (Best for AI/ETL): Wins on Sustained Complexity. Spark's "Stateful" distributed memory model excels at massive shuffles (e.g., joining two 10TB tables or training an iterative ML model), which allows it to solve problems that would choke a standard warehouse.
3. The Convergence
- Databricks is hiding the engine: With "Databricks SQL Serverless," they are abstracting the clusters away to capture the Business Analyst market.
- Snowflake is opening the garden: With "Snowpark" and "Iceberg Tables," they are allowing Python code and external storage to capture the Data Science market.