Duckle v0.4.1 is out - DuckDB 1.5.4, in-app updates, Custom SQL for duck sources, and proxy support for REST. Read the release notes
Use cases

Real pipelines, built and run in Duckle

Every example below is an actual pipeline on the Duckle canvas, executed locally on DuckDB. The numbers are real run results, not benchmarks on someone else's hardware.

mega_enrich_parallel
Four sources joined through a visual Map, filtered, then fanned out by Parallelize
Cross-system enrichment

Join four systems in one pass, no warehouse

The problem. The data you need lives in different places: customers in a CSV export, orders in Parquet, the product catalog in a DuckDB file, regions in SQLite. Normally you would load all four into a warehouse just to join them.

In Duckle. Drop all four as source nodes and wire them into one visual Map - a 3-way join with per-output typed expressions (margin, tier, region rollups) and an inline filter status = 'ACTIVE' AND amount > 30. A Filter trims the result, then Parallelize fans it into category- and region-tier aggregates that write to DuckDB (upsert), Parquet and CSV simultaneously.

4 systemsCSV + Parquet + DuckDB + SQLite
16 nodesone run, parallel branches
~3 s279 rows written
Visual transformation

Map, join and derive columns without writing SQL

The problem. Multi-table joins with derived columns are the heart of ETL, and writing them by hand is where bugs hide.

In Duckle. The Visual Mapper shows the main input and each lookup on the left, every output column with its type and expression in the middle, and a function reference on the right. Drag to map a column, or write an expression like ROUND(main.amount - lookup_1.unit_cost, 2) for a margin. An inline filter applies to the whole join. Every mapping is type-checked live, and the node still emits plain SQL you can read in the Plan tab.

1 main + 3 lookupsinner or left joins
Typedper-column expressions
Inline filteron the joined rows
Visual Mapper
The visual Map editor with a main input, two lookups, typed output expressions and a filter
snowflake_costsave_heavy
A billion-row Parquet enriched through two Mappers and aggregated into a Snowflake upsert
Warehouse cost savings

Crunch a billion rows locally, write only the rollup to Snowflake

The problem. Warehouses bill for every scan. Producing a daily revenue summary from a billion-row fact table means paying for a billion-row scan, every day.

In Duckle. Read a 1-billion-row orders Parquet with products from SQLite, accounts over ADBC and regions as dimensions. Two Mappers enrich the orders and join regions; an Aggregate produces the daily revenue summary; the final node upserts only that summary into Snowflake (DEMO_REVENUE_SUMMARY) over the SQL API with PAT/JWT auth. The billion-row scan happens on your machine; Snowflake stores and bills for kilobytes.

1B rowsscanned on DuckDB
5 sourcesParquet, SQLite, ADBC, CSV
Aggregate onlyupserted to Snowflake
Change data capture

Mirror a change feed with upsert and delete propagation

The problem. Keeping a downstream copy in sync means applying inserts, updates and deletes - and reprocessing everything each run is wasteful.

In Duckle. Read a DuckLake CDC change feed and a customers feed, keep the latest change per order_id, enrich, and mirror into a DuckDB table with a universal MERGE. Deletes in the feed are propagated via the change-type column, so removed rows disappear downstream too. Watermark state advances only on a fully successful run, so the mirror is exactly-once even if a run fails midway.

100k rowschange feed mirrored
1.7 supsert + delete
Exactly-oncestate on success only
ducklake_cdc_mirror
A DuckLake CDC change feed mirrored into a DuckDB table with upsert and delete propagation
incremental_load
A watermark incremental load reading 5 million rows and appending only new rows
Incremental loading

Load 5 million rows in about a second, then only the new ones

The problem. Re-reading a whole table every run does not scale, and naive incremental logic silently skips rows when a run fails.

In Duckle. Point the Incremental node at a watermark column (here order_id). The first run loads everything and saves the high-water mark to the workspace; later runs read only rows past it and append to the DuckDB target. Crucially, the watermark advances only on a fully successful run - a crash or a partial preview never moves it, so you never lose rows.

5M rowsfirst load
~1 send to end
Safestate on success only
Local AI

Describe a pipeline, or prep data for RAG, on device

The problem. You want an assistant to draft pipelines and to clean data for AI, without shipping your data or prompts to a third party.

In Duckle. Duckie runs Qwen 2.5 Coder locally through llama.cpp. Describe what you need; it streams a valid pipeline and a click drops it onto the canvas. For AI data prep, chain xf.ai.chunk -> xf.ai.pii -> xf.ai.embed -> xf.ai.dedupe and land vectors in pgvector or Pinecone - three of those transforms run with no API at all. Retrieve with local Vector Similarity Search and BM25 full-text search.

0 API keysfor the assistant
On deviceprompts + data stay local
Hybridvector + full-text retrieval
Duckie AI
The Duckie AI assistant panel beside the canvas and component palette
MCP server
A pipeline with a 4-way join Mapper built through the MCP server
Automation & agents

Build and run pipelines from Claude over MCP

The problem. You want an LLM agent to author and operate real pipelines, not just talk about them.

In Duckle. The built-in MCP server connects Claude (or any MCP client) in one click. Over the protocol the model can list components, fetch a component's schema, create a pipeline (validated before it is written), validate, run it headlessly, read run logs, and even build a standalone executable. Secrets stay as ${ENV:KEY} placeholders, never hardcoded. The screenshot shows a 4-way-join Mapper pipeline driven through MCP.

One clickconnect Claude
Create / runvalidated pipelines
Buildstandalone executables
DuckDB + DuckLake + Quack

The whole DuckDB stack in one pipeline

The idea. DuckDB is the engine, DuckLake is the lakehouse, Quack puts it on the wire. Duckle is the tool that turns all three into a pipeline you can run and ship.

In Duckle. Raw podcast plays in an embedded DuckDB file are joined to an episode list (CSV) on a visual Map. The enriched stream lands in a DuckLake lakehouse (versioned Parquet, time travel), and a SQL mart is published two ways at once: to a remote DuckDB over Quack that apps can query live, and to a local DuckDB serving file. One DuckDB family, three storage shapes (embedded, lakehouse, remote), wired by drag-and-drop. Nothing left the laptop.

3 enginesDuckDB + DuckLake + Quack
7 nodesone local run
2,548 ms6,080 rows, 0 errors
Podcast Lakehouse
A pipeline joining a DuckDB source and a CSV on a visual Map, landing in a DuckLake lakehouse and publishing a mart to a remote DuckDB over Quack plus a local DuckDB file