Schema Management

coodie provides a multi-tier approach to schema management, from automatic table syncing during development to versioned migration files for production deployments.

Tier 1 — Automatic `sync_table()` (Phase A)

Every Document class has a sync_table() class method that idempotently creates or updates the table to match the model definition.

Basic Usage

from coodie.aio import Document, init_coodie
from coodie.fields import PrimaryKey, Indexed
from typing import Annotated
from uuid import UUID, uuid4
from pydantic import Field

class Product(Document):
    id: Annotated[UUID, PrimaryKey()] = Field(default_factory=uuid4)
    name: str
    brand: Annotated[str, Indexed()] = "Unknown"
    price: float = 0.0

    class Settings:
        name = "products"
        keyspace = "catalog"

# At application startup:
await init_coodie(hosts=["127.0.0.1"], keyspace="catalog")
await Product.sync_table()

sync_table() performs the following operations:

Operation	Description
`CREATE TABLE IF NOT EXISTS`	Creates the table with full primary key and options
`ALTER TABLE … ADD`	Adds columns present in the model but missing in the DB
Schema drift warning	Logs a warning when DB has columns not in the model
`ALTER TABLE … WITH`	Updates table options (TTL, compaction, etc.) when changed
`CREATE INDEX IF NOT EXISTS`	Creates secondary indexes for `Indexed()` columns
`DROP INDEX IF EXISTS`	Drops stale indexes (opt-in via flag)

Return Value

sync_table() returns a list of CQL statements that were executed:

planned = await Product.sync_table()
for cql in planned:
    print(cql)
# CREATE TABLE IF NOT EXISTS catalog.products (...)
# CREATE INDEX IF NOT EXISTS products_brand_idx ON catalog.products ("brand")

Dry-Run Mode

Preview what CQL would be executed without touching the database:

planned = await Product.sync_table(dry_run=True)
for cql in planned:
    print(f"[WOULD EXECUTE] {cql}")

This is useful for CI/CD pipelines where you want to review planned schema changes before applying them.

Schema Drift Detection

When the database has columns that are not defined in the model, sync_table() logs a warning:

WARNING coodie: Schema drift detected: columns {'legacy_col'} exist in
catalog.products but are not defined in the model

This helps identify abandoned columns that should be cleaned up or added back to the model.

Table Option Changes

If your model defines table options that differ from the current database state, sync_table() automatically applies the changes:

class Session(Document):
    token: Annotated[str, PrimaryKey()]
    user_id: UUID

    class Settings:
        name = "sessions"
        keyspace = "catalog"
        __default_ttl__ = 7200  # 2 hours

If the table already exists with a different TTL, sync_table() runs:

ALTER TABLE catalog.sessions WITH default_time_to_live = 7200

Dropping Stale Indexes

By default, sync_table() only creates indexes — it never drops them. To remove indexes that are no longer defined in the model, use the drop_removed_indexes flag:

await Product.sync_table(drop_removed_indexes=True)

Warning

Dropping an index is a destructive operation. Only use drop_removed_indexes=True when you are certain the index is no longer needed.

When to Use

Local development — run sync_table() at application startup
CI test environments — create tables automatically before tests
Simple additive production changes — adding a column or index

Tier 2 — Migration Files (Phase B)

For production deployments, coodie provides a versioned migration framework inspired by Django, Alembic, and Beanie. Migration files are Python modules that define schema changes as upgrade() / downgrade() methods.

Migration File Structure

Migration files live in a directory (default: migrations/) and follow a naming convention:

migrations/
├── 20260115_001_add_rating_column.py
├── 20260203_002_create_reviews_index.py
└── 20260220_003_drop_legacy_status.py

Naming pattern: YYYYMMDD_NNN_<description>.py

YYYYMMDD — date prefix for chronological ordering
NNN — sequence number within the same date
<description> — human-readable description

Writing a Migration

Each migration file defines a ForwardMigration class that inherits from coodie.migrations.Migration:

# migrations/20260115_001_add_rating_column.py
from coodie.migrations import Migration


class ForwardMigration(Migration):
    description = "Add rating column to reviews table"
    allow_destructive = False  # default

    async def upgrade(self, ctx):
        """Apply the migration."""
        await ctx.execute(
            'ALTER TABLE catalog.reviews ADD "rating" int'
        )
        await ctx.execute(
            "CREATE INDEX IF NOT EXISTS reviews_rating_idx "
            'ON catalog.reviews ("rating")'
        )

    async def downgrade(self, ctx):
        """Reverse the migration (optional)."""
        await ctx.execute(
            "DROP INDEX IF EXISTS catalog.reviews_rating_idx"
        )
        await ctx.execute(
            'ALTER TABLE catalog.reviews DROP "rating"'
        )

Migration Attributes

Attribute	Type	Default	Description
`description`	`str`	`""`	Human-readable description
`allow_destructive`	`bool`	`False`	Allow DROP operations
`reversible`	`bool`	`True`	Whether `downgrade()` is implemented

MigrationContext

The ctx parameter passed to upgrade() and downgrade() is a MigrationContext that wraps the coodie driver:

# Execute arbitrary CQL
await ctx.execute("ALTER TABLE ks.tbl ADD \"col\" int")

# With parameters
await ctx.execute(
    "INSERT INTO ks.tbl (id, name) VALUES (?, ?)",
    [some_uuid, "Widget"],
)

In dry-run mode, ctx.execute() records the CQL but does not send it to the database.

Applying Migrations

Using the CLI

# Apply all pending migrations
coodie migrate --keyspace catalog

# Preview what would be applied (dry-run)
coodie migrate --keyspace catalog --dry-run

# Apply up to a specific migration
coodie migrate --keyspace catalog --target 20260203_002

# Show migration status
coodie migrate --keyspace catalog --status

# Rollback the last migration
coodie migrate --keyspace catalog --rollback --steps 1

CLI Options

Option	Default	Description
`--keyspace`	(required)	Target keyspace
`--hosts`	`127.0.0.1`	Cassandra/ScyllaDB contact points
`--migrations-dir`	`migrations`	Path to migration files
`--dry-run`	`False`	Show planned CQL without executing
`--status`	`False`	Show migration status
`--rollback`	`False`	Rollback migrations
`--steps`	`1`	Number of migrations to rollback
`--target`	(none)	Stop after this migration prefix
`--driver-type`	`scylla`	Driver type (`scylla` or `cassandra`)

Using the Python API

from coodie.drivers import init_coodie
from coodie.migrations import MigrationRunner

driver = init_coodie(hosts=["127.0.0.1"], keyspace="catalog")
runner = MigrationRunner(
    driver=driver,
    keyspace="catalog",
    migrations_dir="migrations",
)

# Apply all pending
results = await runner.apply()

# Dry-run
results = await runner.apply(dry_run=True)

# Apply up to a target
results = await runner.apply(target="20260203_002")

# Rollback
results = await runner.rollback(steps=1)

# Check status
statuses = await runner.status()

# Get pending migrations
pending = await runner.pending()

State Tracking

Applied migrations are tracked in a _coodie_migrations table:

CREATE TABLE IF NOT EXISTS <keyspace>."_coodie_migrations" (
    migration_name text PRIMARY KEY,
    applied_at     timestamp,
    description    text,
    checksum       text
);

migration_name — the filename without .py
applied_at — when the migration was applied
description — from the ForwardMigration.description attribute
checksum — SHA-256 hash of the migration file

Safety Features

LWT-Based Lock

Before applying or rolling back migrations, the runner acquires a distributed lock using a Cassandra lightweight transaction (LWT):

INSERT INTO <keyspace>."_coodie_migrations_lock"
    (lock_id, acquired_at, owner)
    VALUES (?, ?, ?)
    IF NOT EXISTS
    USING TTL 300

This prevents concurrent migration runs from multiple application instances. The lock auto-expires after 5 minutes (configurable via lock_ttl).

Schema Agreement Wait

Before executing DDL statements, the runner waits for all cluster nodes to agree on the current schema version. This prevents split-brain scenarios where concurrent DDL causes schema disagreement.

Data Migrations

For complex data transformations, write a migration that reads and rewrites data:

from coodie.migrations import Migration


class ForwardMigration(Migration):
    description = "Split name into first_name and last_name"
    reversible = False  # data migrations are often irreversible

    async def upgrade(self, ctx):
        # Step 1: Add new columns
        await ctx.execute(
            'ALTER TABLE catalog.users ADD "first_name" text'
        )
        await ctx.execute(
            'ALTER TABLE catalog.users ADD "last_name" text'
        )

        # Step 2: Read and transform existing data
        rows = await ctx.execute(
            "SELECT id, name FROM catalog.users"
        )
        for row in rows:
            parts = row["name"].split(" ", 1)
            first = parts[0]
            last = parts[1] if len(parts) > 1 else ""
            await ctx.execute(
                'UPDATE catalog.users SET "first_name" = ?, '
                '"last_name" = ? WHERE "id" = ?',
                [first, last, row["id"]],
            )

Comparison with Other ORMs

The following table compares coodie’s schema management capabilities with other popular Python ORMs and ODMs:

Feature	coodie	cqlengine	Beanie (MongoDB)	Django ORM	Alembic (SQLAlchemy)
Database	Cassandra / ScyllaDB	Cassandra	MongoDB	PostgreSQL, MySQL, SQLite	PostgreSQL, MySQL, SQLite
Auto schema sync	✅ `sync_table()`	✅ `sync_table()`	✅ `init_beanie()`	✅ `migrate`	❌ (manual only)
Dry-run preview	✅ `dry_run=True`	❌	❌	✅ `--plan`	✅ `--sql`
Schema drift detection	✅ warns on extra DB columns	❌	❌	❌	❌
Versioned migration files	✅ `YYYYMMDD_NNN_*.py`	❌	✅ `*.py`	✅ auto-generated	✅ auto-generated
Rollback / downgrade	✅ `downgrade()` method	❌	✅ `backward()` method	✅ `migrate <app> <N>`	✅ `downgrade()`
Auto-generate migrations	❌ (planned Phase C)	❌	❌ (manual)	✅ `makemigrations`	✅ `--autogenerate`
State tracking table	✅ `_coodie_migrations`	❌	❌ (in-app tracking)	✅ `django_migrations`	✅ `alembic_version`
Distributed lock	✅ LWT with TTL	❌	❌	❌	❌
Schema agreement wait	✅ polls `system.peers`	❌	N/A (schemaless)	N/A	N/A
Data migrations	✅ arbitrary CQL in `upgrade()`	❌	✅ via iterative docs	✅ `RunPython`	✅ `op.execute()`
CLI tool	✅ `coodie migrate`	❌	❌	✅ `manage.py migrate`	✅ `alembic upgrade`
Checksum verification	✅ SHA-256 per file	❌	❌	❌	❌
Table option changes	✅ `ALTER TABLE … WITH`	❌	N/A	❌	❌
Index management	✅ create + drop stale	✅ create only	✅ (MongoDB indexes)	✅ (via migrations)	✅ (via migrations)
Async support	✅ native async	❌	✅ native async	❌ (sync only)	❌ (sync only)

Note

cqlengine’s sync_table() only creates tables and adds new columns — it cannot detect drift, change table options, or drop stale indexes. coodie’s sync_table() is a superset of cqlengine’s, adding dry-run, drift detection, option changes, and index cleanup.

Django and Alembic have mature auto-generation that detects model changes and produces migration files automatically. coodie’s auto-generation is planned for Phase C.

Choosing the Right Tier

Scenario	Recommended Approach
Local development / prototyping	`sync_table()` at startup
CI test environments	`sync_table()` in fixtures
Adding a column in production	Migration file or `sync_table()`
Dropping a column	Migration file with `allow_destructive = True`
Changing table options	`sync_table()` or migration file
Data transformation	Migration file with data logic
Primary key change	Migration file (new table + data copy)

What’s Next?

Defining Documents — learn how Document classes work
Drivers & Initialization — driver initialization and configuration
Sync vs Async API — choosing between sync and async APIs