Dataset Schema
Each dataset has exactly one schema.
The schema declares the top-level typed fields that every record must contain. timestamp and context are standard fields and are always required; do not include them in fields.
Create dataset metadata
Create metadata first. This gives you the datasetId used by schema registration, ingest, and query.
POST /api/datasets/:projectSlug
Authorization: Bearer pat_xxx
Content-Type: application/json {
"name": "Orders",
"description": "Paid orders from billing"
} Before schema registration, the dataset has:
{
"schema": null,
"status": "pending_schema"
} Register schema
POST /api/datasets/:projectSlug/:datasetId/schema
Authorization: Bearer pat_xxx
Content-Type: application/json {
"fields": {
"country": "string",
"plan": "string",
"amount": "number",
"active": "boolean",
"signup_at": "timestamp"
}
} Response:
{
"dataset": {
"id": 13,
"name": "Orders",
"slug": "orders",
"schema": {
"version": 1,
"fields": {
"country": "string",
"plan": "string",
"amount": "number",
"active": "boolean",
"signup_at": "timestamp"
}
},
"status": "active"
},
"schema": {
"version": 1,
"fields": {
"country": "string",
"plan": "string",
"amount": "number",
"active": "boolean",
"signup_at": "timestamp"
}
}
} The API also accepts the fields object directly:
{
"country": "string",
"amount": "number"
} Using { "fields": { ... } } is recommended because it leaves room for future schema metadata.
For TypeScript projects, @loguro/datasets can define the schema once and register it through DatasetQueryClient. See Datasets — TypeScript SDK.
Read schema
GET /api/datasets/:projectSlug/:datasetId/schema
Authorization: Bearer pat_xxx If the dataset has no schema yet:
{ "error": "Dataset schema not found" } Legacy default schema
The legacy project-level endpoint still exists for default-dataset compatibility:
POST /api/datasets/:projectSlug/schema
GET /api/datasets/:projectSlug/schema Use dataset-specific endpoints for new code. The project-level endpoint maps to the project’s default dataset.
Supported field types
| Type | JSON value accepted on ingest | Query behavior |
|---|---|---|
string | JSON string | filter, group, select, count distinct, min, max |
number | finite JSON number | filter, select, sum, avg, min, max |
boolean | JSON boolean | filter, select, group |
timestamp | timestamp string | filter, select, group, min, max |
Timestamp strings should be ISO 8601 / RFC 3339 compatible:
"2026-06-10T10:58:00Z" Field names
Field names must match:
^[A-Za-z_][A-Za-z0-9_]{0,63}$ Valid:
country
plan
amount
signup_at
feature_1 Invalid:
country-code
1st_seen
plan name
context.user Reserved fields
Do not declare these names:
| Reserved field |
|---|
id |
timestamp |
ingested_at |
context |
__traceId |
timestamp and context are required on every record, but they are standard fields, not declared fields.
Immutability
After a schema is registered, do not change it. The physical storage uses typed Parquet columns. Changing types later would make historical files ambiguous.
For a new shape, create a new dataset in the same project and register a different schema there.
Design tips
- Use stable business names, not UI labels.
- Prefer
snake_casefield names. - Keep high-cardinality debug data in
context. - Put fields you want to filter, group, or aggregate at the top level.
- Use
timestampfor event time, not ingest time. - Use explicit numeric fields for money or durations.
Example schema for subscription events:
{
"fields": {
"event": "string",
"customer_id": "string",
"plan": "string",
"amount": "number",
"currency": "string",
"trial": "boolean",
"subscription_started_at": "timestamp"
}
}