Data skipping index examples
This page consolidates ClickHouse data skipping index examples, showing how to declare each type, when to use them, and how to verify they're applied. All features work with MergeTree-family tables.
Index syntax:
See a worked example with EXPLAIN
and pruning.
Set index
Use the set
index when local (per-block) cardinality is low; not helpful if each block has many distinct values.
A creation/materialization workflow and the before/after effect are shown in the basic operation guide.
Generic Bloom filter (scalar)
The bloom_filter
index is good for "needle in a haystack" equality/IN membership. It accepts an optional parameter which is the false-positive rate (default 0.025).
N-gram Bloom filter (ngrambf_v1) for substring search
The ngrambf_v1
index splits strings into n-grams. It works well for LIKE '%...%'
queries. It supports String/FixedString/Map (via mapKeys/mapValues), as well as tunable size, hash count, and seed. See the documentation for N-gram bloom filter for further details.
This guide shows practical examples and when to use token vs ngram.
Parameter optimization helpers:
The four ngrambf_v1 parameters (n-gram size, bitmap size, hash functions, seed) significantly impact performance and memory usage. Use these functions to calculate optimal bitmap size and hash function count based on your expected n-gram volume and desired false positive rate:
See parameter docs for complete tuning guidance.
Token Bloom filter (tokenbf_v1) for word-based search
tokenbf_v1
indexes tokens separated by non-alphanumeric characters. You should use it with hasToken
, LIKE
word patterns or equals/IN. It supports String
/FixedString
/Map
types.
See Token bloom filter and Bloom filter types pages for more details.
See observability examples and guidance on token vs ngram here.
Add indexes during CREATE TABLE (multiple examples)
Skipping indexes also support composite expressions and Map
/Tuple
/Nested
types. This is demonstrated in the example below:
Materializing on existing data and verifying
You can add an index to existing data parts using MATERIALIZE
, and inspect pruning with EXPLAIN
or trace logs, as shown below:
This worked minmax example demonstrates EXPLAIN output structure and pruning counts.
When to use and when to avoid skipping indexes
Use skip indexes when:
- Filter values are sparse within data blocks
- Strong correlation exists with
ORDER BY
columns or data ingestion patterns group similar values together - Performing text searches on large log datasets (
ngrambf_v1
/tokenbf_v1
types)
Avoid skip indexes when:
- Most blocks likely contain at least one matching value (blocks will be read regardless)
- Filtering on high-cardinality columns with no correlation to data ordering
If a value appears even once in a data block, ClickHouse must read the entire block. Test indexes with realistic datasets and adjust granularity and type-specific parameters based on actual performance measurements.
Temporarily ignore or force indexes
Disable specific indexes by name for individual queries during testing and troubleshooting. Settings also exist to force index usage when needed. See ignore_data_skipping_indices
.
Notes and caveats
- Skipping indexes are only supported on MergeTree-family tables; pruning happens at the granule/block level.
- Bloom-filter-based indexes are probabilistic (false positives cause extra reads but won't skip valid data).
- Bloom filters and other skip indexes should be validated with
EXPLAIN
and tracing; adjust granularity to balance pruning vs. index size.