This post walks you through installing ClickHouse, creating your first table, inserting data, and writing queries that actually return useful results fast.
Reading about ClickHouse is one thing. Running your first query and seeing it return in milliseconds is another. By the end of this post, you will have a working ClickHouse table, real data inside it, and a handful of queries that show you what ClickHouse can do. No prior ClickHouse experience needed.
How do you install ClickHouse?
There are two straightforward ways to get ClickHouse running locally. Pick the one that fits your setup.
Option 1: Install directly on Linux or macOS
On Ubuntu or Debian:
sudo apt-get install -y apt-transport-https ca-certificates curl gnupgcurl -fsSL ‘https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key’ | sudo gpg –dearmor -o /usr/share/keyrings/clickhouse-keyring.gpgecho “deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main” | sudo tee /etc/apt/sources.list.d/clickhouse.listsudo apt-get updatesudo apt-get install -y clickhouse-server clickhouse-clientsudo service clickhouse-server start
On macOS using Homebrew:
brew install clickhouse
Option 2: Run with Docker
If you prefer to keep things contained, Docker is the fastest way to get a ClickHouse instance running without touching your system packages:
docker run -d –name clickhouse-server \
-p 8123:8123 -p 9000:9000 \
clickhouse/clickhouse-server
Once it is running, open the ClickHouse client:
clickhouse-client
You should see the ClickHouse prompt. You are in.
What scenario will we use to learn with?
Throughout this post, we will work with a subscription events table for a fictional SaaS product called Trackly. Trackly is a simple project management tool. Every time a user does something related to their subscription, Trackly logs an event.
The events we care about are:
- trial_started: a new user signs up for a free trial
- upgraded: a trial user converts to a paid plan
- churned: a paid user cancels their subscription
- renewed: a paid user’s subscription renews for another month
Companies like Trackly generate millions of these events per year, and answering basic questions like “how many users upgraded last month” or “what is the churn rate by plan type” requires scanning every row in the events table. This is exactly the kind of workload ClickHouse handles well.
How do you create your first ClickHouse table?
Here is the CREATE TABLE statement for Trackly’s subscription events:
CREATE TABLE subscription_events (
user_id UInt64,
plan String,
country String,
event_type String,
revenue Float64,
event_date Date
) ENGINE = MergeTree()
ORDER BY (country, event_date)
PARTITION BY toYYYYMM(event_date);
Three things to understand here before moving on:
- ENGINE = MergeTree(). This tells ClickHouse to use the MergeTree storage engine. For any production table storing real data, this is almost always the right choice. The previous post in this series covers all engine types if you want more context.
- ORDER BY (country, event_date). This is the sorting key. ClickHouse physically sorts all data on disk by country first, then by date within each country. When you run a query filtered by country or date, ClickHouse skips over rows that cannot possibly match your filter. The closer your ORDER BY matches your most common query filters, the faster your queries will be. If you do not specify a PRIMARY KEY separately, ClickHouse uses ORDER BY as the primary key by default.
- PARTITION BY toYYYYMM(event_date). This groups data into monthly partitions. If you want to delete all events older than 12 months, ClickHouse drops the entire partition in one fast operation instead of scanning and deleting individual rows.
How do you insert data into ClickHouse?
For testing and learning, you can insert rows directly using VALUES:
INSERT INTO subscription_events VALUES
(1001, 'pro', 'Ireland', 'trial_started', 0.00, '2024-03-01'),
(1002, 'starter', 'USA', 'trial_started', 0.00, '2024-03-01'),
(1001, 'pro', 'Ireland', 'upgraded', 49.00, '2024-03-08'),
(1003, 'pro', 'Germany', 'trial_started', 0.00, '2024-03-10'),
(1002, 'starter', 'USA', 'churned', 0.00, '2024-03-15'),
(1004, 'starter', 'UK', 'trial_started', 0.00, '2024-03-18'),
(1003, 'pro', 'Germany', 'upgraded', 49.00, '2024-03-22'),
(1004, 'starter', 'UK', 'upgraded', 19.00, '2024-03-29');
What do your first ClickHouse queries look like?
Now for the part that makes ClickHouse worth understanding: your first queries on real data.
Query 1: How many events happened in total?
SELECT count()
FROM subscription_events;
Result: 8. Eight events in the table. Simple, but confirms your data loaded correctly.
Query 2: How many users upgraded by country?
SELECT
country,
count() AS upgrades
FROM subscription_events
WHERE event_type = 'upgraded'
GROUP BY country
ORDER BY upgrades DESC;
Result:
| country | upgrades |
|---|---|
| Germany | 1 |
| Ireland | 1 |
| UK | 1 |
This tells you that in March 2024, one user upgraded in each of Germany, Ireland, and the UK. In a real dataset with millions of rows, this same query returns in under a second because ClickHouse reads only the country and event_type columns and skips the rest.
Query 3: What was total revenue by plan in March?
SELECT
plan,
sum(revenue) AS total_revenue
FROM subscription_events
WHERE event_date >= '2024-03-01'
AND event_date < '2024-04-01'
AND event_type = 'upgraded'
GROUP BY plan
ORDER BY total_revenue DESC;
Result:
| plan | total_revenue |
|---|---|
| pro | 98.00 |
| starter | 19.00 |
Pro plan generated 98 euros in upgrades during March. Starter plan generated 19. At eight rows this is trivial, but imagine running this same query across 200 million rows of historical subscription events. ClickHouse reads only four columns out of six, applies the date filter using the partition boundary, and returns the result in milliseconds.
What should your ORDER BY be based on?
This is the most important schema design decision you will make when creating a ClickHouse table. Getting it right from the start saves a lot of pain later.
The rule is simple: your ORDER BY columns should match the filters you use most often in your WHERE clauses.
For Trackly’s events table, the most common queries filter by country and event_date. So ORDER BY (country, event_date) means ClickHouse can skip entire chunks of data for queries that filter on those columns.
If instead you put ORDER BY user_id but always filter by country, ClickHouse cannot skip anything useful. It reads more data than it needs to, and queries slow down.
Three practical guidelines for choosing ORDER BY:
- Put your most selective filter column first. If most queries filter by country, country goes first.
- Put date or time columns after higher-cardinality columns. Date ranges work well as a second or third sort key.
- Do not put high-cardinality columns like
user_idfirst unless most queries filter by a specific user. High cardinality first means the index is less useful for range queries.
What are the most common mistakes beginners make?
After setting up your first table and running your first queries, these are the mistakes that slow most people down:
- Choosing ORDER BY based on what looks logical, not what queries actually filter on. Your sort key should match your access patterns, not your instinct about what the “natural” key is.
- Over-partitioning. Partitioning by day when you have millions of rows per day creates too many small parts and slows things down. Partition by month unless you have a strong reason not to.
- Inserting one row at a time in production. ClickHouse is designed for batch inserts. Single-row inserts create one part per insert and force constant background merging. Always batch your inserts.
- Treating ClickHouse like PostgreSQL. Frequent updates, single-row lookups, and joins on large tables behave very differently in ClickHouse. Design your schema around append-heavy, scan-heavy workloads.
Looking for ClickHouse consulting or implementation support? Book a free ClickHouse consultation
