Batch Operations
Cassandra batches group multiple CQL statements into a single request.
coodie provides BatchQuery (sync) and AsyncBatchQuery (async) as
context managers that accumulate statements and execute them together
on exit.
Warning
Cassandra batches are not like SQL transactions. They guarantee atomicity (all-or-nothing) only for logged batches, and they work best when all statements target the same partition. Batching across partitions adds coordinator overhead and rarely improves performance.
Setup
from coodie.fields import PrimaryKey
from typing import Annotated
from uuid import UUID, uuid4
from datetime import datetime
from pydantic import Field
class Event(Document):
user_id: Annotated[UUID, PrimaryKey()]
ts: Annotated[datetime, PrimaryKey(clustering=True)]
event_type: str
class Settings:
name = "events"
Sync Batches
Use BatchQuery as a context manager. Pass batch=batch to each
save(), insert(), update(), or delete() call to defer execution:
from coodie.sync import BatchQuery
with BatchQuery() as batch:
Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch)
Event(user_id=uid, ts=datetime.now(), event_type="page_view").save(batch=batch)
# All statements execute as one batch when the `with` block exits
Generated CQL:
BEGIN BATCH
INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?);
INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?);
APPLY BATCH;
Async Batches
The async equivalent is AsyncBatchQuery:
from coodie.aio import AsyncBatchQuery
async with AsyncBatchQuery() as batch:
await Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch)
await Event(user_id=uid, ts=datetime.now(), event_type="page_view").save(batch=batch)
Batch Types
Cassandra supports three batch types:
Type |
CQL |
Use Case |
|---|---|---|
Logged (default) |
|
Atomicity across rows in the same partition |
Unlogged |
|
Performance when atomicity is not needed |
Counter |
|
Batching counter updates |
Logged Batch (Default)
Logged batches guarantee that either all statements succeed or none do. This is the default — matching both CQL semantics and cqlengine convention:
with BatchQuery(logged=True) as batch: # logged=True is the default
...
Warning
Logged batches have important limitations:
Batch log overhead: Before executing, the coordinator writes the entire batch to a distributed batch log. This adds latency and I/O on every logged batch.
Multi-partition atomicity is limited: Across partitions, logged batches guarantee that all mutations eventually apply (durability), but other clients may see partial results during execution — there is no isolation.
Single-partition batches don’t need logging: If all statements target the same partition, use
logged=False. Single-partition writes are already atomic in Cassandra/ScyllaDB, so the batch log adds cost with no benefit.Size limits: ScyllaDB rejects batches exceeding
batch_size_fail_threshold_in_kb(default 1 MB) and warns abovebatch_size_warn_threshold_in_kb(default 128 KB).
For more details on batch semantics and limits, see:
Unlogged Batch
Unlogged batches skip the batch log, reducing overhead. Use them when all statements target the same partition and you don’t need the atomicity guarantee:
with BatchQuery(logged=False) as batch:
...
Counter Batch
Counter updates must use a counter batch:
with BatchQuery(batch_type="COUNTER") as batch:
...
Manual Execution
You don’t have to use the context manager. Call execute() directly
when you want to fire the batch at a specific point:
batch = BatchQuery()
batch.add("INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?)", [uid, now, "login"])
batch.add("INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?)", [uid, now, "logout"])
batch.execute()
Error Handling
If an exception occurs inside the with block, the batch is not
executed. The context manager only calls execute() when no exception
is raised:
with BatchQuery() as batch:
Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch)
raise ValueError("oops")
# batch.execute() is NOT called — no statements are sent
Best Practices
Same partition: Keep all statements in a batch targeting the same partition key. Cross-partition batches add coordinator log overhead.
Small batches: Keep batches small (tens of statements, not thousands). Large batches can cause pressure on the coordinator node.
Don’t use batches for bulk loading: For inserting thousands of rows, use individual inserts with async concurrency. Batches are for atomicity, not throughput.
Counter batches are separate: You cannot mix counter and non-counter statements in the same batch.
LWT in batches: A batch can contain at most one conditional (LWT) statement, and all statements must target the same partition.
Low-Level: build_batch()
For advanced use cases, you can build batch CQL directly:
from coodie.cql_builder import build_batch, build_insert
statements = [
build_insert("events", "myapp", {"user_id": uid, "ts": now, "event_type": "login"}),
build_insert("events", "myapp", {"user_id": uid, "ts": now, "event_type": "page_view"}),
]
cql, params = build_batch(statements, logged=True)
# cql = "BEGIN BATCH\n INSERT INTO ...;\n INSERT INTO ...;\nAPPLY BATCH"
What’s Next?
Lightweight Transactions (LWT) — conditional writes with IF NOT EXISTS / IF EXISTS
Sync vs Async API — choosing between sync and async APIs
Drivers & Initialization — driver configuration and multi-cluster setups
Exceptions & Error Handling — error handling patterns