Snowflake Micro Partitions Partition Pruning, Column Pruning

Platform: Snowflake · Parquet · Delta Lake · Iceberg · BigQuery · Redshift · ClickHouse

Topics: Micro-partitions · No partial partition · Partition pruning · Column pruning · Columnar storage · Cross-engine comparison

Why this matters before we start

Every query you run against a large Snowflake table is, at its core, a question the engine has to answer: which parts of this table do I actually need to read? Get that answer right and a query against a 10 TB table reads 500 GB. Get it wrong and it reads all 10 TB. The mechanism that makes the right answer possible — and the mistakes that produce the wrong one — all live inside the micro-partition.

🗂️ Section 01 — What is a micro-partition?

A micro-partition is Snowflake's fundamental physical storage unit. Every table is physically stored as a collection of micro-partitions — immutable compressed files written to object storage (S3, Azure Blob, or GCS). You never see these files directly. There is no filesystem path, no file to download, no size to inspect on disk. The micro-partition is entirely internal to Snowflake's storage layer.

Each micro-partition holds 50 MB to 500 MB of uncompressed data. After ZSTD compression the on-disk size shrinks to roughly 5–50 MB. A narrow, highly compressible table might pack 1–2 million rows per partition. A wide 100-column table with VARIANT columns might hold only 150,000–200,000 rows in the same 500 MB ceiling.

The two parts of every micro-partition

Every micro-partition has a fixed two-part structure:

Part 1 — The metadata header. A small block of statistics stored alongside the partition. It contains, for every column in the table: the minimum value, the maximum value, the total row count, and the null count. This header is only a few kilobytes regardless of how wide the table is. It is always read first — before any data is decompressed, before any column chunks are opened.

Part 2 — The column chunks. Each column's data is stored as a separate compressed block. The data for order_date lives in one chunk, amount in another, status in a third. These chunks are completely independent. Reading one column chunk does not require reading any other.

Immutable writes — why Time Travel is free

Snowflake never modifies an existing micro-partition. Every UPDATE, DELETE, MERGE, or INSERT creates new partitions and marks old ones as superseded. Those superseded partitions remain intact and queryable for exactly DATA_RETENTION_TIME_IN_DAYS days. Time Travel costs nothing extra at the storage layer — the historical data is already there. The cost comes from how long you choose to retain it.

⚠️ Section 02 — no partial micro-partitions

This is the single most important structural fact about micro-partitions — and the one most commonly misunderstood.

Columnar storage does not mean partial micro-partitions. The term "columnar" describes how data is stored and read within a partition — each column's values are grouped together in an independent compressed chunk. It does not mean that a partition can hold some columns and skip others. The columnar layout lives inside the partition. The partition itself always contains every column of the table, for every row it holds.

This has concrete cost consequences. A 100-column table where you insert 10 GB of changed data daily accumulates 10 GB × 90 days = 900 GB of Time Travel storage — across every column, not just the changed ones. The wider the table, the more expensive each write becomes in storage terms.

A row is an indivisible unit. All its columns always live in the same micro-partition — and every write to that row creates a brand-new partition carrying all of them.

🔍 Section 03 — Partition pruning

Partition pruning is how Snowflake decides which micro-partitions to read and which to skip entirely. It operates exclusively on the metadata header — the per-column MIN/MAX values — before any column chunk data is ever decompressed or transferred.

VALIDATE PRUNING AFTER EVERY SCHEMA CHANGE

SELECT partitions_scanned, partitions_total, ROUND(100.0 * partitions_scanned / partitions_total, 1) AS pct_scanned FROM snowflake.account_usage.query_history WHERE start_time >= DATEADD('minute', -10, CURRENT_TIMESTAMP) ORDER BY start_time DESC LIMIT 10;

Target: below 10% for selective date queries. A value of 100% on a date-filtered query means zero pruning occurred.

Five patterns that silently break partition pruning

-- ❌ Pattern 1: function-wrapping the cluster key — always a full scan
WHERE YEAR(order_date) = 2024
-- ✅ Correct: raw range predicate
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'

-- ❌ Pattern 2: implicit type casting (TIMESTAMP literal vs DATE column)
WHERE order_date = '2024-01-15 00:00:00'
-- ✅ Correct: typed literal, no cast required
WHERE order_date = DATE '2024-01-15'

-- ❌ Pattern 3: OR with non-cluster column neutralises pruning entirely
WHERE order_date = '2024-01-15' OR customer_tier = 'GOLD'
-- ✅ Correct: UNION ALL gives each branch independent pruning

-- ❌ Pattern 4: filter applied outside UNION ALL
SELECT * FROM (
  SELECT * FROM fact_2023 UNION ALL SELECT * FROM fact_2024
) WHERE order_date = '2024-01-15'
-- ✅ Correct: push filter inside each branch

-- ❌ Pattern 5: high-cardinality cluster key (order_id, customer_id)
-- Every partition has near-unique MIN/MAX — date queries still scan 100%
-- Pay all reclustering credit cost, get zero query benefit

📋 Section 04 — Column pruning

Partition pruning decides which partitions to open. Column pruning decides which column chunks inside those partitions to actually decompress and read. They are independent mechanisms that compound together.

Because every column in a micro-partition is stored as a separate compressed chunk, Snowflake can read any subset of columns without touching the others. A query selecting 3 columns from a 100-column table opens the relevant partitions but decompresses only 3 column chunks. The other 97 are never touched.

The most expensive habit in data engineering:

SELECT * on a 100-column fact table reads 33× more data than SELECT 3 columns. That 33× difference multiplies across every query execution, every warehouse credit, and every billing cycle. Make explicit column lists a team standard.;

📈 Section 05 — Partition pruning + column pruning: the compound reduction

The two mechanisms are independent and they stack. A query exploiting both achieves reductions neither can reach alone.

Neither mechanism alone achieves this. Partition pruning alone: 500 GB. Column pruning alone (no date filter): 300 GB. Together: 0.75 GB.

🌐 Section 06 — How micro-partitions compare across platforms

The concept behind micro-partitions is universal — every columnar engine groups rows into physical storage units, stores per-column MIN/MAX statistics, and uses those statistics to skip units at query time. What varies is the unit's name, size, who controls it, and who manages its lifecycle.

Apache Parquet — row groups

Parquet organizes data into row groups, each containing a contiguous set of rows. Statistics (MIN, MAX, NULL_COUNT) are stored in the file footer. The key difference from Snowflake: you control row group size via parquet.block.size at write time. A pipeline writing thousands of tiny Parquet files pays significant overhead on every scan. Snowflake manages this automatically.

Delta Lake and Apache Iceberg — file-level statistics in a transaction log

Both build on top of Parquet files. Delta Lake maintains a transaction log (JSON commit files); Iceberg maintains manifest files. Both record per-file statistics — functionally equivalent to Snowflake's metadata headers. The critical operational difference: you must manage compaction. Without periodic OPTIMIZE (Delta) or rewrite_data_files (Iceberg), small files accumulate and pruning degrades. Snowflake's automatic partition management eliminates this.

Google BigQuery — Capacitor shards

BigQuery's internal Capacitor format is entirely opaque — you cannot inspect shard boundaries or count shards. The user-visible concept is the partition (by DATE, RANGE, or ingestion time). BigQuery shows estimated bytes scanned before execution (via the UI and dry-run API), letting you validate pruning before paying. Snowflake shows bytes scanned only after execution in QUERY_HISTORY.

Amazon Redshift — zone maps and 1 MB blocks

Redshift uses fixed 1 MB blocks. Per-block MIN/MAX statistics are called zone maps — the concept is identical to micro-partition metadata. The 1 MB block size is much smaller than Snowflake's 50–500 MB, giving finer skip granularity but more metadata to check. Redshift requires periodic VACUUM to reclaim space. Snowflake's immutable partition model eliminates this entirely.

ClickHouse — granules and MergeTree

ClickHouse's granule defaults to 8,192 rows — orders of magnitude smaller than any other engine listed. This gives extremely fine-grained pruning and very fast point lookups. The trade-off: you must declare ORDER BY at table creation. Changing it requires recreating the table. ClickHouse also supports explicit secondary skipping indexes (bloom filter, min-max, set) as DDL. Snowflake's Search Optimization Service is the closest equivalent but is fully managed and billed per table.

✅ Five things to do this week

1) Audit your column access

Query ACCOUNT_USAGE.ACCESS_HISTORY to find ghost columns — zero accesses in 90 days. Check the data model and take decision on those columns. Moving the cold columns to a separate table helps increase the partition capacity and reduces the total partitions.

2) Ban SELECT * in production

On a 100-column fact table, SELECT * reads 33× more data than SELECT 3 columns. Make explicit column lists a team standard and enforce it in SQL review.

3) Validate pruning on your top queries

Check partitions_scanned / partitions_total in QUERY_HISTORY. Any selective date query scanning 100% of partitions has a pruning failure — type mismatch, function wrapping, or wrong cluster key.

4) Fix your cluster key predicates

WHERE YEAR(order_date) = 2024 is a full table scan. WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31' is not. Same results. 20× different cost.

5) Right-size your table types

Wide tables with VARIANT columns create larger partitions, hold fewer rows per partition, and accumulate more Time Travel storage per write. Review column types quarterly using the heat classification query above.

Tags: #Snowflake #DataEngineering #DataPlatform #MicroPartitions #PartitionPruning #ColumnPruning #CloudFinOps #Parquet #DeltaLake #Iceberg #BigQuery #Redshift #ClickHouse #StorageOptimization

Snowflake Micro-Partitions: The storage unit that powers everything

Why this matters before we start