# Batch Operations Cassandra batches group multiple CQL statements into a single request. coodie provides `BatchQuery` (sync) and `AsyncBatchQuery` (async) as context managers that accumulate statements and execute them together on exit. ```{warning} Cassandra batches are **not** like SQL transactions. They guarantee atomicity (all-or-nothing) only for **logged** batches, and they work best when all statements target the **same partition**. Batching across partitions adds coordinator overhead and rarely improves performance. ``` ## Setup ```python from coodie.fields import PrimaryKey from typing import Annotated from uuid import UUID, uuid4 from datetime import datetime from pydantic import Field class Event(Document): user_id: Annotated[UUID, PrimaryKey()] ts: Annotated[datetime, PrimaryKey(clustering=True)] event_type: str class Settings: name = "events" ``` ## Sync Batches Use `BatchQuery` as a context manager. Pass `batch=batch` to each `save()`, `insert()`, `update()`, or `delete()` call to defer execution: ```python from coodie.sync import BatchQuery with BatchQuery() as batch: Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch) Event(user_id=uid, ts=datetime.now(), event_type="page_view").save(batch=batch) # All statements execute as one batch when the `with` block exits ``` Generated CQL: ```sql BEGIN BATCH INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?); INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?); APPLY BATCH; ``` ## Async Batches The async equivalent is `AsyncBatchQuery`: ```python from coodie.aio import AsyncBatchQuery async with AsyncBatchQuery() as batch: await Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch) await Event(user_id=uid, ts=datetime.now(), event_type="page_view").save(batch=batch) ``` ## Batch Types Cassandra supports three batch types: | Type | CQL | Use Case | |------|-----|----------| | Logged (default) | `BEGIN BATCH` | Atomicity across rows in the same partition | | Unlogged | `BEGIN UNLOGGED BATCH` | Performance when atomicity is not needed | | Counter | `BEGIN COUNTER BATCH` | Batching counter updates | ### Logged Batch (Default) Logged batches guarantee that either all statements succeed or none do. This is the default — matching both CQL semantics and cqlengine convention: ```python with BatchQuery(logged=True) as batch: # logged=True is the default ... ``` ```{warning} Logged batches have important limitations: - **Batch log overhead**: Before executing, the coordinator writes the entire batch to a distributed batch log. This adds latency and I/O on every logged batch. - **Multi-partition atomicity is limited**: Across partitions, logged batches guarantee that all mutations *eventually* apply (durability), but other clients may see partial results during execution — there is no isolation. - **Single-partition batches don't need logging**: If all statements target the same partition, use `logged=False`. Single-partition writes are already atomic in Cassandra/ScyllaDB, so the batch log adds cost with no benefit. - **Size limits**: ScyllaDB rejects batches exceeding `batch_size_fail_threshold_in_kb` (default 1 MB) and warns above `batch_size_warn_threshold_in_kb` (default 128 KB). ``` For more details on batch semantics and limits, see: - [ScyllaDB — BATCH statement](https://opensource.docs.scylladb.com/stable/cql/dml/batch.html) - [Cassandra — BATCH statement](https://cassandra.apache.org/doc/latest/cassandra/cql/dml.html#batch) ### Unlogged Batch Unlogged batches skip the batch log, reducing overhead. Use them when all statements target the same partition and you don't need the atomicity guarantee: ```python with BatchQuery(logged=False) as batch: ... ``` ### Counter Batch Counter updates must use a counter batch: ```python with BatchQuery(batch_type="COUNTER") as batch: ... ``` ## Manual Execution You don't have to use the context manager. Call `execute()` directly when you want to fire the batch at a specific point: ```python batch = BatchQuery() batch.add("INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?)", [uid, now, "login"]) batch.add("INSERT INTO events (user_id, ts, event_type) VALUES (?, ?, ?)", [uid, now, "logout"]) batch.execute() ``` ## Error Handling If an exception occurs inside the `with` block, the batch is **not** executed. The context manager only calls `execute()` when no exception is raised: ```python with BatchQuery() as batch: Event(user_id=uid, ts=datetime.now(), event_type="login").save(batch=batch) raise ValueError("oops") # batch.execute() is NOT called — no statements are sent ``` ## Best Practices 1. **Same partition**: Keep all statements in a batch targeting the same partition key. Cross-partition batches add coordinator log overhead. 2. **Small batches**: Keep batches small (tens of statements, not thousands). Large batches can cause pressure on the coordinator node. 3. **Don't use batches for bulk loading**: For inserting thousands of rows, use individual inserts with async concurrency. Batches are for atomicity, not throughput. 4. **Counter batches are separate**: You cannot mix counter and non-counter statements in the same batch. 5. **LWT in batches**: A batch can contain at most **one** conditional (LWT) statement, and all statements must target the same partition. ## Low-Level: build_batch() For advanced use cases, you can build batch CQL directly: ```python from coodie.cql_builder import build_batch, build_insert statements = [ build_insert("events", "myapp", {"user_id": uid, "ts": now, "event_type": "login"}), build_insert("events", "myapp", {"user_id": uid, "ts": now, "event_type": "page_view"}), ] cql, params = build_batch(statements, logged=True) # cql = "BEGIN BATCH\n INSERT INTO ...;\n INSERT INTO ...;\nAPPLY BATCH" ``` ## What's Next? - {doc}`lwt` — conditional writes with IF NOT EXISTS / IF EXISTS - {doc}`sync-vs-async` — choosing between sync and async APIs - {doc}`drivers` — driver configuration and multi-cluster setups - {doc}`exceptions` — error handling patterns