Duckle use cases

Four sources joined through a visual Map, filtered, then fanned out by Parallelize

Cross-system enrichment

Join four systems in one pass, no warehouse

The problem. The data you need lives in different places: customers in a CSV export, orders in Parquet, the product catalog in a DuckDB file, regions in SQLite. Normally you would load all four into a warehouse just to join them.

In Duckle. Drop all four as source nodes and wire them into one visual Map - a 3-way join with per-output typed expressions (margin, tier, region rollups) and an inline filter status = 'ACTIVE' AND amount > 30. A Filter trims the result, then Parallelize fans it into category- and region-tier aggregates that write to DuckDB (upsert), Parquet and CSV simultaneously.

4 systemsCSV + Parquet + DuckDB + SQLite

16 nodesone run, parallel branches

~3 s279 rows written

Visual transformation

Map, join and derive columns without writing SQL

The problem. Multi-table joins with derived columns are the heart of ETL, and writing them by hand is where bugs hide.

In Duckle. The Visual Mapper shows the main input and each lookup on the left, every output column with its type and expression in the middle, and a function reference on the right. Drag to map a column, or write an expression like ROUND(main.amount - lookup_1.unit_cost, 2) for a margin. An inline filter applies to the whole join. Every mapping is type-checked live, and the node still emits plain SQL you can read in the Plan tab.

1 main + 3 lookupsinner or left joins

Typedper-column expressions

Inline filteron the joined rows

The visual Map editor with a main input, two lookups, typed output expressions and a filter

A billion-row Parquet enriched through two Mappers and aggregated into a Snowflake upsert

Warehouse cost savings

Crunch a billion rows locally, write only the rollup to Snowflake

The problem. Warehouses bill for every scan. Producing a daily revenue summary from a billion-row fact table means paying for a billion-row scan, every day.

In Duckle. Read a 1-billion-row orders Parquet with products from SQLite, accounts over ADBC and regions as dimensions. Two Mappers enrich the orders and join regions; an Aggregate produces the daily revenue summary; the final node upserts only that summary into Snowflake (DEMO_REVENUE_SUMMARY) over the SQL API with PAT/JWT auth. The billion-row scan happens on your machine; Snowflake stores and bills for kilobytes.

1B rowsscanned on DuckDB

5 sourcesParquet, SQLite, ADBC, CSV

Aggregate onlyupserted to Snowflake

Change data capture

Mirror a change feed with upsert and delete propagation

The problem. Keeping a downstream copy in sync means applying inserts, updates and deletes - and reprocessing everything each run is wasteful.

In Duckle. Read a DuckLake CDC change feed and a customers feed, keep the latest change per order_id, enrich, and mirror into a DuckDB table with a universal MERGE. Deletes in the feed are propagated via the change-type column, so removed rows disappear downstream too. Watermark state advances only on a fully successful run, so the mirror is exactly-once even if a run fails midway.

100k rowschange feed mirrored

1.7 supsert + delete

Exactly-oncestate on success only

A DuckLake CDC change feed mirrored into a DuckDB table with upsert and delete propagation

A watermark incremental load reading 5 million rows and appending only new rows

Incremental loading

Load 5 million rows in about a second, then only the new ones

The problem. Re-reading a whole table every run does not scale, and naive incremental logic silently skips rows when a run fails.

In Duckle. Point the Incremental node at a watermark column (here order_id). The first run loads everything and saves the high-water mark to the workspace; later runs read only rows past it and append to the DuckDB target. Crucially, the watermark advances only on a fully successful run - a crash or a partial preview never moves it, so you never lose rows.

5M rowsfirst load

~1 send to end

Safestate on success only

Local AI

Describe a pipeline, or prep data for RAG, on device

The problem. You want an assistant to draft pipelines and to clean data for AI, without shipping your data or prompts to a third party.

In Duckle. Duckie runs Qwen 2.5 Coder locally through llama.cpp. Describe what you need; it streams a valid pipeline and a click drops it onto the canvas. For AI data prep, chain xf.ai.chunk -> xf.ai.pii -> xf.ai.embed -> xf.ai.dedupe and land vectors in pgvector or Pinecone - three of those transforms run with no API at all. Retrieve with local Vector Similarity Search and BM25 full-text search.

0 API keysfor the assistant

On deviceprompts + data stay local

Hybridvector + full-text retrieval

The Duckie AI assistant panel beside the canvas and component palette

A pipeline with a 4-way join Mapper built through the MCP server

Automation & agents

Build and run pipelines from Claude over MCP

The problem. You want an LLM agent to author and operate real pipelines, not just talk about them.

In Duckle. The built-in MCP server connects Claude (or any MCP client) in one click. Over the protocol the model can list components, fetch a component's schema, create a pipeline (validated before it is written), validate, run it headlessly, read run logs, and even build a standalone executable. Secrets stay as ${ENV:KEY} placeholders, never hardcoded. The screenshot shows a 4-way-join Mapper pipeline driven through MCP.

One clickconnect Claude

Create / runvalidated pipelines

Buildstandalone executables

DuckDB + DuckLake + Quack

The whole DuckDB stack in one pipeline

The idea. DuckDB is the engine, DuckLake is the lakehouse, Quack puts it on the wire. Duckle is the tool that turns all three into a pipeline you can run and ship.

In Duckle. Raw podcast plays in an embedded DuckDB file are joined to an episode list (CSV) on a visual Map. The enriched stream lands in a DuckLake lakehouse (versioned Parquet, time travel), and a SQL mart is published two ways at once: to a remote DuckDB over Quack that apps can query live, and to a local DuckDB serving file. One DuckDB family, three storage shapes (embedded, lakehouse, remote), wired by drag-and-drop. Nothing left the laptop.

3 enginesDuckDB + DuckLake + Quack

7 nodesone local run

2,548 ms6,080 rows, 0 errors

A pipeline joining a DuckDB source and a CSV on a visual Map, landing in a DuckLake lakehouse and publishing a mart to a remote DuckDB over Quack plus a local DuckDB file

Real pipelines, built and run in Duckle