Changelog¶
Unreleased¶
Settings: Stop flagging
gateway.recover_after_timeas a difference when bothgateway.expected_nodesandgateway.expected_data_nodesare unset (-1).Admin: Added XMover - CrateDB shard analyzer and movement tool. Thanks, @WalBeh.
2025/08/19 v0.0.41¶
I/O: Updated to
influxio-0.6.0. Thanks, @ZillKhan.The target table is now the measurement name when importing ILP files.
The InfluxDB source URL accepts a
timeoutquery parameter (seconds) to configure the network timeout when talking to the InfluxDB API.For ILP imports, the CrateDB URL no longer needs a table component; you can point it at the schema only (the measurement determines the table).
2025/08/19 v0.0.40¶
I/O: Fixed MongoDB CDC invocation. Thanks, Mỹ Duyên.
2025/08/19 v0.0.39¶
OCI: Started producing image
ghcr.io/crate/cratedb-toolkit-ingestI/O: Added drivers for ODBC and Oracle to
cratedb-toolkit-ingestI/O: Updated BSON library to support ARM64
2025/08/14 v0.0.38¶
I/O: Updated to
ingestr>=0.13.61CFR: Improved log output
CFR: Fixed double quoting of table name. Thanks, @karynzv.
CFR: When importing, started using
replacestrategy instead ofappendCFR: Improved importing data re. type mapping without NumPy
CFR: Truncated target table before importing, using
appendstrategy again, becausereplacedoesn’t do the right DDL.I/O: Tuned down ingestr, it masked native I/O adapters
2025/07/01 v0.0.37¶
Settings: Fixed comparison of
0svs0ms. Thanks, @hlcianfagna.DMS: Provided a recipe file to relay primary key and column type map information
DMS: Provided a recipe option to ignore processing DMS control DDL events
DMS: Started using the “direct” column mapping by default, retaining the “universal” column mapping optionally.
Dependencies: Updated to
commons-codec>=0.0.23I/O: Adapter for PostgreSQL full-load using ingestr
I/O: Added documentation about ingestr adapter
2025/06/23 v0.0.36¶
Dependencies: Migrated from
zyptotikray. It’s effectively the same, but provided using a dedicated package nowDependencies: Updated to
croud-1.14Dependencies: Updated to
async-kinesis-2.0.0. Thanks, @hampsterx.CDC: Added canonical SQL example for PostgreSQL from Ibis
CDC: Enabled loading DMS events from Kinesis streams and stream-dump files
CDC: Added subcommand
ctk dms table-mappings
2025/05/13 v0.0.35¶
Added lost
pytestdependencies tocratedb-toolkit[testing]
2025/05/13 v0.0.34¶
Downgraded to sqlalchemy-cratedb 0.41, version 0.42 is not GA yet
2025/05/12 v0.0.33¶
CFR: Enhanced job statistics with optional reporting database support. Thanks, @WalBeh.
Settings: Added settings comparison utility. Thanks, @WalBeh.
Meta: Added parser for
https://cratedb.com/releases.jsonfile. Thanks, @WalBeh.CFR: Added the ability to anonymize queries recorded by
collectCloud API: SDK and CLI for CrateDB Cloud Cluster and Import APIs. Supports headless/unattended operations on CrateDB Cloud clusters, covering deploy/start/resume and data import procedures using fluent API and CLI.
Cloud API: Added JWT authentication to client API and
ctk shell.Cloud API: Added
healthandpingsubcommands toctk clusterCLI: Downgraded to Click 8.1, as the code is not compatible with 8.2 yet
Breaking changes
Naming things for CLI options and environment variables:
Converged
--cratedb-sqlalchemy-urlvs.--cratedb-http-urloptions into single--cluster-urlConverged
CRATEDB_SQLALCHEMY_URLvs.CRATEDB_HTTP_URLenv vars into singleCRATEDB_CLUSTER_URL
2025/04/23 v0.0.32¶
MCP: Add subsystem providing a few server and client utilities through the
ctk query mcp {list,inquire,launch}subcommands.Docs API: Added extractors for CrateDB functions and settings
Connect: Respect
sslmodeURI parameter when converting SQLAlchemy connection URLs tohttp(s)://
2025/01/31 v0.0.31¶
Fixed connectivity for
jobstats collectRefactored code and improved CLI interface of
ctk infovs.ctk cfrDependencies: Updated to
crate-2.0.0, which usesorjsonfor JSON marshallingCFR: Job statistics and slow-query exploration per Marimo notebook. Thanks, @WalBeh.
2025/01/13 v0.0.30¶
Dependencies: Minimize dependencies of core installation, defer
polarstocratedb-toolkit[io].Fixed
ctk cfr info recordabout too large values ofulimit_hardImproved
ctk shellto also talk to CrateDB standalone databasesAdded basic utility command
ctk tail, for tailing a database table, and optionally following the tailTable Loader: Added capability to load InfluxDB Line Protocol (ILP) files
Query Collector: Now respects
CRATEDB_CLUSTER_URLenvironment variable
2024/10/13 v0.0.29¶
MongoDB: Added Zyp transformations to the CDC subsystem, making it more symmetric to the full-load procedure.
Query Converter: Added very basic expression converter utility with CLI interface
DynamoDB: Added query expression converter for relocating object references, to support query migrations after the breaking change with the SQL DDL schema, by v0.0.27.
2024/10/09 v0.0.28¶
IO: Improved
BulkProcessorwhen running per-record operations by also checkingrowcountfor handlingINSERT OK, 0 rowsresponsesMongoDB: Fixed BSON decoding of
{"$date": 1180690093000}timestamps by updating to commons-codec 0.0.21.Testcontainers: Don’t always pull the OCI image before starting. It is unfortunate in disconnected situations.
2024/10/01 v0.0.27¶
MongoDB: Updated to pymongo 4.9
DynamoDB: Change CrateDB data model to use (
pk,data,aux) columns Attention: This is a breaking change.
2024/09/26 v0.0.26¶
MongoDB: Configure
MongoDBCrateDBConverterafter updating to commons-codec 0.0.18DynamoDB CDC: Fix
MODIFYoperation to also propagate deleted attributes
2024/09/22 v0.0.25¶
Table Loader: Improved conditional handling of “transformation” parameter
Table Loader: Improved status reporting and error logging in
BulkProcessorMongoDB: Improve error reporting
MongoDB Full: Polars’
read_ndjsondoesn’t load MongoDB JSON data well, usefsspecandorjsoninsteadMongoDB Full: Improved initialization of transformation subsystem
MongoDB Adapter: Improved performance of when computing collection cardinality by using
collection.estimated_document_count()MongoDB Full: Optionally use
limitparameter as number of total recordsMongoDB Adapter: Evaluate
_idfilter field by upcasting tobson.ObjectId, to convey a filter that makesctk load tableprocess a single document, identified by its OIDMongoDB Dependencies: Update to commons-codec 0.0.17
2024/09/19 v0.0.24¶
MongoDB Full: Refactor transformation subsystem to
commons-codecMongoDB: Update to commons-codec v0.0.16
2024/09/16 v0.0.23¶
MongoDB: Unlock processing multiple collections, either from server database, or from filesystem directory
MongoDB: Unlock processing JSON files from HTTP resource, using
https+bson://MongoDB: Optionally filter server collection using MongoDB query expression
MongoDB: Improve error handling wrt. bulk operations vs. usability
DynamoDB CDC: Add
ctk load tableinterface for processing CDC eventsDynamoDB CDC: Accept a few more options for the Kinesis Stream: batch-size, create, create-shards, start, seqno, idle-sleep, buffer-time
DynamoDB Full: Improve error handling wrt. bulk operations vs. usability
2024/09/10 v0.0.22¶
MongoDB: Rename columns with leading underscores to use double leading underscores
MongoDB: Add support for UUID types
MongoDB: Improve reading timestamps in previous BSON formats
MongoDB: Fix processing empty arrays/lists. By default, assume
TEXTas inner type.MongoDB: For
ctk load table, use “partial” scan for inferring the collection schema, based on the first 10,000 documents.MongoDB: Skip leaking
UNKNOWNfields into SQL DDL. This means relevant column definitions will not be included into the SQL DDL.MongoDB: Make
ctk load tableuse thedata OBJECT(DYNAMIC)mapping strategy.MongoDB: Sanitize lists of varying objects
MongoDB: Add treatment option for applying special treatments to certain items on real-world data
MongoDB: Use pagination on source collection, for creating batches towards CrateDB
MongoDB: Unlock importing MongoDB Extended JSON files using
file+bson://...
2024/09/02 v0.0.21¶
DynamoDB: Add special decoding for varied lists. Store them into a separate
OBJECT(IGNORED)column in CrateDB.DynamoDB: Add pagination support for
full-loadtable loader
2024/08/27 v0.0.20¶
DMS/DynamoDB: Fix table name quoting within CDC processor handler
2024/08/26 v0.0.19¶
MongoDB: Fix and verify Zyp transformations
DMS/DynamoDB/MongoDB I/O: Use SQL with parameters instead of inlining values
2024/08/21 v0.0.18¶
Dependencies: Unpin commons-codec, to always use the latest version
Dependencies: Unpin lorrystream, to always use the latest version
MongoDB: Improve type mapper by discriminating between
INTEGERandBIGINTMongoDB: Improve type mapper by supporting BSON
DatetimeMS,Decimal128, andInt64types
2024/08/19 v0.0.17¶
Processor: Updated Kinesis Lambda processor to understand AWS DMS
MongoDB: Fix missing output on STDOUT for
migr8 exportMongoDB: Improve timestamp parsing by using
python-dateutilMongoDB: Converge
_idinput field toidcolumn instead of dropping itMongoDB: Make user interface use stderr, so stdout is for data only
MongoDB: Make
migr8 extractwrite to stdout by defaultMongoDB: Make
migr8 translateread from stdin by defaultMongoDB: Improve user interface messages
MongoDB: Strip single leading underscore character from all top-level fields
MongoDB: Map OID types to CrateDB TEXT columns
MongoDB: Make
migr8 extractandmigr8 exportaccept the--limitoptionMongoDB: Fix indentation in prettified SQL output of
migr8 translateMongoDB: Add capability to give type hints and add transformations
Dependencies: Adjust code for lorrystream version 0.0.3
Dependencies: Update to lorrystream 0.0.4 and commons-codec 0.0.7
DynamoDB: Add table loader for full-load operations
2024/07/25 v0.0.16¶
ctk load table: Added support for MongoDB Change StreamsFix dependency with the
kagglepackage, downgrade tokaggle==1.6.14DynamoDB CDC: Add demo to support reading DynamoDB change data capture
2024/07/08 v0.0.15¶
IO: Added the
if-existsquery parameter by updating to influxio 0.4.0.Rockset: Added CrateDB Rockset Adapter, a HTTP API emulation layer
MongoDB: Added adapter amalgamating PyMongo to use CrateDB as backend
SQLAlchemy: Clean up and refactor SQLAlchemy polyfills to
cratedb_toolkit.util.sqlalchemyCFR: Build as a self-contained program using PyInstaller
CFR: Publish self-contained application bundle to GitHub Workflow Artifacts
2024/06/18 v0.0.14¶
Add
ctk infoandctk cfrdiagnostics programsRemove support for Python 3.7
SQLAlchemy dialect: Use
sqlalchemy-cratedb>=0.37.0This includes the fix to theget_table_names()reflection method.
2024/06/11 v0.0.13¶
Dependencies: Migrate from
crate[sqlalchemy]tosqlalchemy-cratedb
2024/05/30 v0.0.12¶
Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by using
ssl=truequery argument also forinfluxdb2://source URLs.
2024/05/30 v0.0.11¶
Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by propagating
ssl=truequery argument. Update dependencies toinfluxio>=0.2.1,<1.
2024/04/10 v0.0.10¶
Dependencies: Unpin upper version bound of
dask. Otherwise, compatibility issues can not be resolved quickly, like with Python 3.11.9. https://github.com/dask/dask/issues/11038
2024/03/22 v0.0.9¶
Dependencies: Use
dask[dataframe]
2024/03/11 v0.0.8¶
datasets: Fix compatibility with Python 3.7
2024/03/07 v0.0.7¶
datasets: Fix dataset loader
2024/03/07 v0.0.6¶
Added
cratedb_toolkit.datasetssubsystem, for acquiring datasets from cratedb-datasets and Kaggle.
2024/02/12 v0.0.5¶
Do not always activate pytest11 entrypoint to pytest fixture
cratedb_service, as it depends on thetestcontainerspackage, which is not always installed.
2024/02/10 v0.0.4¶
Packaging: Use
cloudextra to install relevant packagesDependencies: Add
testingextra, which installstestcontainersonlyTesting: Export
cratedb_servicefixture as pytest11 entrypointSandbox: Reduce number of extras by just using
all
2024/01/18 v0.0.3¶
Add SQL runner utility primitives to
io.sqlnamespaceAdd
import_csv_pandasandimport_csv_daskutility primitivesdata: Add subsystem for “loading” data.
Add SDK and CLI for CrateDB Cloud Data Import APIs
ctk load table ...Add
migr8program from previous repositoryInfluxDB: Add adapter for
influxioMongoDB: Add
migr8program from previous repositoryMongoDB: Improve UX by using
ctk load table mongodb://...load table: Refactor to use more OO
Add
examples/cloud_import.pyAdapt testcontainers to be agnostic of the testing framework. Thanks, @pilosus.
2023/11/06 v0.0.2¶
CLI: Upgrade to
click-aliases>=1.0.2, fixing erroring out when no group aliases are specified.Add support for Python 3.12
SQLAlchemy: Improve UNIQUE constraints polyfill to accept multiple column names, for emulating unique composite keys.
2023/10/10 v0.0.1¶
SQLAlchemy: Add a few patches and polyfills, which do not fit well into the vanilla Python driver / SQLAlchemy dialect.
Retention: Refactor strategies
delete,reallocate, andsnapshot, to standalone variants.Retention: Bundle configuration and runtime settings into
Settingsentity, and use more OO instead of weak dictionaries: AddRetentionStrategy,TableAddress, andSettingsentities, to improve information passing throughout the application and the SQL templates.Retention: Add
--schemaoption, andCRATEDB_EXT_SCHEMAenvironment variable, to configure the database schema used to store the retention policy table. The default value isext.Retention: Use full-qualified table names everywhere.
Retention: Fix: Compensate for
DROP REPOSITORYnow returningRepositoryMissingExceptionwhen the repository does not exist. With previous versions of CrateDB, it wasRepositoryUnknownException.
2023/06/27 v0.0.0¶
Import “data retention” implementation from https://github.com/crate/crate-airflow-tutorial. Thanks, @hammerhead.