Batch Operations

Cassandra batches group multiple CQL statements into a single request. coodie provides BatchQuery (sync) and AsyncBatchQuery (async) as context managers that accumulate statements and execute them together on exit.

Warning

Cassandra batches are not like SQL transactions. They guarantee atomicity (all-or-nothing) only for logged batches, and they work best when all statements target the same partition. Batching across partitions adds coordinator overhead and rarely improves performance.

Setup

from coodie.fields import PrimaryKey
from typing import Annotated
from uuid import UUID, uuid4
from datetime import datetime
from pydantic import Field

class Event(Document):
    user_id: Annotated[UUID, PrimaryKey()]
    ts: Annotated[datetime, PrimaryKey(clustering=True)]
    event_type: str

    class Settings:
        name = "events"

Sync Batches

Use BatchQuery as a context manager. Pass batch=batch to each save(), insert(), update(), or delete() call to defer execution:

from coodie.sync import BatchQuery

with BatchQuery() as batch:
    Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch)
    Event(user_id=uid, ts=datetime.now(), event_type="page_view").save(batch=batch)
# All statements execute as one batch when the `with` block exits

Generated CQL:

BEGIN BATCH
  INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?);
  INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?);
APPLY BATCH;

Async Batches

The async equivalent is AsyncBatchQuery:

from coodie.aio import AsyncBatchQuery

async with AsyncBatchQuery() as batch:
    await Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch)
    await Event(user_id=uid, ts=datetime.now(), event_type="page_view").save(batch=batch)

Batch Types

Cassandra supports three batch types:

Type

CQL

Use Case

Logged (default)

BEGIN BATCH

Atomicity across rows in the same partition

Unlogged

BEGIN UNLOGGED BATCH

Performance when atomicity is not needed

Counter

BEGIN COUNTER BATCH

Batching counter updates

Logged Batch (Default)

Logged batches guarantee that either all statements succeed or none do. This is the default — matching both CQL semantics and cqlengine convention:

with BatchQuery(logged=True) as batch:  # logged=True is the default
    ...

Warning

Logged batches have important limitations:

  • Batch log overhead: Before executing, the coordinator writes the entire batch to a distributed batch log. This adds latency and I/O on every logged batch.

  • Multi-partition atomicity is limited: Across partitions, logged batches guarantee that all mutations eventually apply (durability), but other clients may see partial results during execution — there is no isolation.

  • Single-partition batches don’t need logging: If all statements target the same partition, use logged=False. Single-partition writes are already atomic in Cassandra/ScyllaDB, so the batch log adds cost with no benefit.

  • Size limits: ScyllaDB rejects batches exceeding batch_size_fail_threshold_in_kb (default 1 MB) and warns above batch_size_warn_threshold_in_kb (default 128 KB).

For more details on batch semantics and limits, see:

Unlogged Batch

Unlogged batches skip the batch log, reducing overhead. Use them when all statements target the same partition and you don’t need the atomicity guarantee:

with BatchQuery(logged=False) as batch:
    ...

Counter Batch

Counter updates must use a counter batch:

with BatchQuery(batch_type="COUNTER") as batch:
    ...

Manual Execution

You don’t have to use the context manager. Call execute() directly when you want to fire the batch at a specific point:

batch = BatchQuery()
batch.add("INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?)", [uid, now, "login"])
batch.add("INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?)", [uid, now, "logout"])
batch.execute()

Error Handling

If an exception occurs inside the with block, the batch is not executed. The context manager only calls execute() when no exception is raised:

with BatchQuery() as batch:
    Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch)
    raise ValueError("oops")
    # batch.execute() is NOT called — no statements are sent

Best Practices

  1. Same partition: Keep all statements in a batch targeting the same partition key. Cross-partition batches add coordinator log overhead.

  2. Small batches: Keep batches small (tens of statements, not thousands). Large batches can cause pressure on the coordinator node.

  3. Don’t use batches for bulk loading: For inserting thousands of rows, use individual inserts with async concurrency. Batches are for atomicity, not throughput.

  4. Counter batches are separate: You cannot mix counter and non-counter statements in the same batch.

  5. LWT in batches: A batch can contain at most one conditional (LWT) statement, and all statements must target the same partition.

Low-Level: build_batch()

For advanced use cases, you can build batch CQL directly:

from coodie.cql_builder import build_batch, build_insert

statements = [
    build_insert("events", "myapp", {"user_id": uid, "ts": now, "event_type": "login"}),
    build_insert("events", "myapp", {"user_id": uid, "ts": now, "event_type": "page_view"}),
]

cql, params = build_batch(statements, logged=True)
# cql = "BEGIN BATCH\n  INSERT INTO ...;\n  INSERT INTO ...;\nAPPLY BATCH"

What’s Next?