Skip to content
LogoLogo

SQL Transforms

A sql dataset is a saved DuckDB query over your other datasets. It's where data shaping lives: the sums, joins, and filters an island must never carry. The manifest stays declarative because the arithmetic happens here, in a file you own, and the island only ever names columns that already exist.

Loading diagram...

Why shaping belongs in SQL

An island config names fields; it never computes them. You don't put a SUM, a GROUP BY, or a join inside an island, because that buries logic in a layer that can't validate it and can't be reused. Move the computation into a transform and two things follow: the shaped table can feed several islands, and validate can check every binding against real, materialized columns.

Writing one

Drop a .sql file under models/ (the convention is models/transforms/) and register it as a dataset by its sql path:

"datasets": {
  "transactions": { "source": "data/transactions.csv" },
  "monthly":      { "sql": "models/transforms/monthly.sql" }
}

The file is a single DuckDB SELECT. It reads your other datasets by their dataset name, as if each were a table:

-- models/transforms/monthly.sql
SELECT
  date_trunc('month', ts) AS month,
  SUM(amount_eur)         AS spend_eur,
  COUNT(*)                AS txns
FROM transactions
GROUP BY 1
ORDER BY 1;

Now a timeseries.line binds to monthly with x: "month" and y: "spend_eur", and every value it reads is a column DuckDB computed once.

Each dataset registers as a DuckDB view named after the dataset, so a transform can use the full dialect: joins, CTEs, window functions, date math, anything DuckDB supports. A trailing semicolon is fine. The file must be one statement, a SELECT optionally led by a WITH.

Transforms can build on transforms

Because every dataset is a view, a transform can read another transform, not only a file. The one rule is order. Datasets register in the order they appear under datasets, so a transform has to be declared after everything it reads:

"datasets": {
  "transactions": { "source": "data/transactions.csv" },
  "monthly":      { "sql": "models/transforms/monthly.sql" },     // reads transactions
  "monthly_net":  { "sql": "models/transforms/monthly_net.sql" }  // reads monthly, declared after it
}

Derived and read-only

A sql dataset is derived. It's a query, not a file, so nothing can write to it: no action and no connector may target it. Data flows one way, from the source files it reads, through the SELECT, into the islands. To change what a transform returns, change its SQL or the files beneath it.

A transform renders to islands and takes no parameters. For a parameterized, on-demand read over a dataset — one an agent calls and gets rows back from, not a chart — declare a query instead. A query is a declarative spec the compiler turns into SQL, and it often points at a transform: the transform does the joins and shaping, the query parameterizes the read.

The binding check still applies

Islands bind to a transform's output columns, and validate checks them against the live result. The compiler materializes the view, asks DuckDB for its columns and types, and rejects any island bound to a column the SELECT doesn't produce, naming the page, the island, and the missing field. Rename a column in the SQL and forget to update the island, and the build fails loudly instead of rendering an empty chart. It works upstream too: drop a column from a source CSV that the transform selects, and the error surfaces the moment you validate.

Where to go next

  • Data Contracts: the three dataset shapes and the contract check.
  • Actions: typed writes into the source files a transform reads.
  • The Manifest: how datasets and islands fit together.