Interactive walkthrough
Learn Duckle. The whole machine.
A hands-on tour of the entire product and codebase - one idea per screen. Step through with the arrows. No scrolling.
What you'll learn
Ten chapters, start to finish.
Press the right arrow or Next to begin.
01 Local-first
Your data never leaves the machine in front of you.
Duckle runs DuckDB in-process. There is no server to upload rows to, and no warehouse bill for a preview. The compute is the chip already on your desk.
- Read from files, databases and APIs straight into memory on your own CPU.
- Transform with DuckDB's vectorized engine - a billion rows on a laptop, no cluster.
- Write results wherever you choose. The bytes only travel if you send them.
01 The canvas is the program
Boxes and wires. That's a real program.
You build a pipeline by dragging nodes onto a canvas and wiring them together.
- A node is one step - read a table, filter it, write it.
- A wire is data flowing from one step to the next.
- The graph is the execution order. Duckle reads it top to bottom.
01 It compiles to SQL
No magic. Just SQL you can see.
CREATE VIEW orders AS SELECT * FROM read_csv_auto('orders.csv'); CREATE VIEW keep_paid AS SELECT * FROM orders WHERE status='paid'; COPY keep_paid TO 'paid.parquet' (FORMAT parquet);
With Duckle, where does your data get processed?
02 The parts
components ship today · 341 available now, plus preview and planned
Five families do the work; Code is the escape hatch. Learn the five verbs and you can read any pipeline.
Sources
A source pulls rows into the pipeline. Files, warehouses, app databases, object stores, streams, REST APIs and SaaS tools - 109 of them.
The next pages list every source. Green dot = available now, grey = planned, purple = preview. Step through them.
Transforms
Transforms change data. This is where most of the work lives - 133 of them. First, let's actually watch the important ones work.
Every transform is one promise: a table in, a table out. Watch the rows.
| id | name | amt | status |
|---|---|---|---|
| 1 | Ada | 120 | paid |
| 2 | Lin | 45 | pending |
| 3 | Omar | 300 | paid |
| 4 | Zoe | 90 | refunded |
| id | name | amt | status |
|---|---|---|---|
| 1 | Ada | 120 | paid |
| 3 | Omar | 300 | paid |
SELECT * FROM orders WHERE status = 'paid' AND amt >= 100;
| region | product | units |
|---|---|---|
| West | Widget | 10 |
| West | Gadget | 5 |
| East | Widget | 8 |
| West | Widget | 3 |
| East | Gadget | 12 |
| region | total_units | orders |
|---|---|---|
| West | 18 | 3 |
| East | 20 | 2 |
SELECT region, SUM(units) AS total_units, COUNT(*) AS orders FROM sales GROUP BY region;
| order | cust | amt |
|---|---|---|
| 101 | 1 | 120 |
| 102 | 2 | 45 |
| 103 | 4 | 300 |
| cust | name |
|---|---|
| 1 | Ada |
| 2 | Lin |
| order | cust | amt | name |
|---|---|---|---|
| 101 | 1 | 120 | Ada |
| 102 | 2 | 45 | Lin |
Order 103 has no customer 4, so it drops.
SELECT o.order, o.cust, o.amt, c.name FROM orders o JOIN customers c ON o.cust = c.cust;
| day | rev |
|---|---|
| 1 | 100 |
| 2 | 140 |
| 3 | 90 |
| 4 | 200 |
| day | rev | running | prev | rank |
|---|---|---|---|---|
| 1 | 100 | 100 | - | 3 |
| 2 | 140 | 240 | 100 | 2 |
| 3 | 90 | 330 | 140 | 4 |
| 4 | 200 | 530 | 90 | 1 |
SELECT day, rev, SUM(rev) OVER (ORDER BY day) AS running, LAG(rev) OVER (ORDER BY day) AS prev, RANK() OVER (ORDER BY rev DESC) AS rank FROM daily;
| region | quarter | sales |
|---|---|---|
| West | Q1 | 100 |
| West | Q2 | 150 |
| East | Q1 | 80 |
| East | Q2 | 120 |
| region | Q1 | Q2 |
|---|---|---|
| West | 100 | 150 |
| East | 80 | 120 |
Reverse with xf.unpivot.
PIVOT sales ON quarter USING SUM(sales) GROUP BY region;
04 The same pattern, everywhere
Five more, at a glance.
"19.99" → 19.99ORDER BY rev DESC LIMIT 31 per email, keep latestWHERE updated_at > markvalid_to set + insertmd5(row) + loaded_at04 AI transforms, local
Prepare text and vectors, in the flow.
1 row → many chunks555-0142 → [PHONE]text → [0.02, -0.88, ...]note → categoryGroup by turns many input rows into...
All 133 transforms
You've seen the pattern. Here is the full set - reshape, combine, enrich, quality-fix and AI, page by page.
Sinks
A sink lands the result - Parquet, Iceberg, a database table, an object store, MotherDuck, even an email. 66 of them.
Every sink supports append; most support upsert and delete propagation.
Quality
Assert what must be true before bad data spreads: expectations, uniqueness, referential integrity, profiling, reconciliation. 25 of them.
A failing check can stop the run, or route the bad rows aside to a dead-letter path.
Control
Wire logic between steps: loop over items, call a child pipeline, run branches in parallel, log, warn, or stop the run on a condition. 19 of them.
Control is what turns a pipeline into a program.
Code
When a box will not do, drop to code inline: raw SQL, Python, JavaScript, a shell command, WASM, DuckDB. 7 of them - so you never hit a wall.
05 Under the hood
The canvas is not a metaphor. It is a plan.
Press Run and Duckle topologically sorts your graph into an ordered plan of DuckDB SQL stages. Five ideas keep it fast and honest.
- Plan of stages. The graph becomes a numbered 1 to N order. Each node is one SQL step.
- Lazy views. Most stages compile to a
CREATE VIEW, not a table. Nothing computes until a sink pulls. - ATTACH, not copy. A database source runs
ATTACH ... AS duckle_srcand queries in place, then detaches. - Batched vs per-stage. One fast pass by default; flip materialization on a node for a safe on-disk checkpoint.
- Reject ports. Quality nodes have a second output - failing rows fall to a dead-letter path.
Most pipeline stages compile to a...
06 The codebase
Seven pieces. One engine they share.
Duckle is a Rust workspace. Every surface calls into the same engine, so a pipeline runs identically everywhere. Click a piece to explore.
07 Complex pipelines
The ones that used to need a team.
Join four systems
Blend a warehouse, an app database, a CSV and an API - no staging, no copies.
postgres · snowflake · csv · restxf.map · xf.joinsnk.parquetA billion rows, locally
Roll up a billion-row fact table on your laptop; ship only the small result to the cloud.
src.parquet 1e9xf.groupbysnk.motherduckLakehouse time-travel
Read a DuckLake table as of two moments and diff them to see exactly what changed.
src.ducklake.diffxf.diffsummarysnk.csv07 What people build
Eight jobs, one tool.
Warehouse cost cut
Pre-aggregate locally; send only the rollup to Snowflake or BigQuery.
Database migration
Move Oracle or SQL Server to Postgres with upsert and CDC, verified to the row.
Lakehouse ingestion
Land files into Iceberg or DuckLake with schema control and time travel.
Reverse ETL
Push modeled data back out to a SaaS app - Slack, HubSpot, a webhook.
Data quality gate
Block a load when expectations, uniqueness or referential integrity fail.
AI / RAG prep
Chunk, redact PII and embed documents into a vector store for retrieval.
Observability rollups
Turn raw logs and metrics into SLOs, p95s and forecasts for dashboards.
Scheduled reports
Run nightly on a cron and drop a fresh Parquet or emailed CSV.
08 Ship it
Build it once. Run it five ways.
The same pipeline file runs from the studio, a terminal, a browser, a single binary, or an AI agent. Same engine, same result.
- Runner.
duckle run pipeline.json- a lean headless CLI for cron, systemd or CI. - Serve.
duckle servehosts a web console with runs history and a cron scheduler. - Build a binary. Export a self-contained executable with secrets bundled.
- MCP. An LLM can list components, generate a validated pipeline, run it, read the logs.
- Duckie. A local copilot (Qwen via llama.cpp) that builds pipelines for you, on your machine.
09 Know what changed
Data you can trust, and prove.
Drag the timeline. The result rewrites itself to the table's state as of that moment - real time travel, no backup to restore.
| city | revenue |
|---|
09 The trust suite
Three more guarantees.
- Column lineage. Trace any output column backward through every node to the exact source table and column.
- .ducklock. A signed lockfile pins the schema and shape a run expects, so drift is caught, not discovered.
- Contracts & review. Gate a pipeline on data contracts and run
duckle reviewbefore you promote a change.
What lets Duckle show a table exactly as it was last month?
10 You've seen the whole machine
Now godraw the boxes.
Sources read, transforms reshape, sinks write, quality guards, control orchestrates - and it all compiles to SQL you can see.