Dataset Schema

Each dataset has exactly one schema.

The schema declares the top-level typed fields that every record must contain. timestamp and context are standard fields and are always required; do not include them in fields.

Create dataset metadata

Create metadata first. This gives you the datasetId used by schema registration, ingest, and query.

POST /api/datasets/:projectSlug
Authorization: Bearer pat_xxx
Content-Type: application/json

{
  "name": "Orders",
  "description": "Paid orders from billing"
}

Before schema registration, the dataset has:

{
  "schema": null,
  "status": "pending_schema"
}

Register schema

POST /api/datasets/:projectSlug/:datasetId/schema
Authorization: Bearer pat_xxx
Content-Type: application/json

{
  "fields": {
    "country": "string",
    "plan": "string",
    "amount": "number",
    "active": "boolean",
    "signup_at": "timestamp"
  }
}

Response:

{
  "dataset": {
    "id": 13,
    "name": "Orders",
    "slug": "orders",
    "schema": {
      "version": 1,
      "fields": {
        "country": "string",
        "plan": "string",
        "amount": "number",
        "active": "boolean",
        "signup_at": "timestamp"
      }
    },
    "status": "active"
  },
  "schema": {
    "version": 1,
    "fields": {
      "country": "string",
      "plan": "string",
      "amount": "number",
      "active": "boolean",
      "signup_at": "timestamp"
    }
  }
}

The API also accepts the fields object directly:

{
  "country": "string",
  "amount": "number"
}

Using { "fields": { ... } } is recommended because it leaves room for future schema metadata.

For TypeScript projects, @loguro/datasets can define the schema once and register it through DatasetQueryClient. See Datasets — TypeScript SDK.

Read schema

GET /api/datasets/:projectSlug/:datasetId/schema
Authorization: Bearer pat_xxx

If the dataset has no schema yet:

{ "error": "Dataset schema not found" }

Legacy default schema

The legacy project-level endpoint still exists for default-dataset compatibility:

POST /api/datasets/:projectSlug/schema
GET /api/datasets/:projectSlug/schema

Use dataset-specific endpoints for new code. The project-level endpoint maps to the project’s default dataset.

Supported field types

Type	JSON value accepted on ingest	Query behavior
`string`	JSON string	filter, group, select, count distinct, min, max
`number`	finite JSON number	filter, select, `sum`, `avg`, `min`, `max`
`boolean`	JSON boolean	filter, select, group
`timestamp`	timestamp string	filter, select, group, `min`, `max`

Timestamp strings should be ISO 8601 / RFC 3339 compatible:

"2026-06-10T10:58:00Z"

Field names

Field names must match:

^[A-Za-z_][A-Za-z0-9_]{0,63}$

Valid:

country
plan
amount
signup_at
feature_1

Invalid:

country-code
1st_seen
plan name
context.user

Reserved fields

Do not declare these names:

Reserved field
`id`
`timestamp`
`ingested_at`
`context`
`__traceId`

timestamp and context are required on every record, but they are standard fields, not declared fields.

Immutability

After a schema is registered, do not change it. The physical storage uses typed Parquet columns. Changing types later would make historical files ambiguous.

For a new shape, create a new dataset in the same project and register a different schema there.

Design tips

Use stable business names, not UI labels.
Prefer snake_case field names.
Keep high-cardinality debug data in context.
Put fields you want to filter, group, or aggregate at the top level.
Use timestamp for event time, not ingest time.
Use explicit numeric fields for money or durations.

Example schema for subscription events:

{
  "fields": {
    "event": "string",
    "customer_id": "string",
    "plan": "string",
    "amount": "number",
    "currency": "string",
    "trial": "boolean",
    "subscription_started_at": "timestamp"
  }
}