Automated tracking of feature launches, pricing changes, partnerships, and architecture shifts across data integration and ingestion companies — updated daily.
Type
Company
500 of 1,843 total (showing most recent 500)
Feature LaunchFivetran
Jun 29, 2026
Agents Schema
An open standard for making business context readable by AI agents, allowing teams to designate a schema in their warehouse or lake as a shared context layer that agents can query before acting. The schema contains metric definitions, semantic models, dbt lineage, and custom business documentation in plain SQL tables.
Proprietary cloud data warehouses and raw data lakes → Open Data Infrastructure (lakehouse architecture with open storage, file formats, table metadata, and flexible compute)
Shift from tightly coupled proprietary warehouse systems or uncontrolled raw data lakes toward an unbundled, standards-based architecture separating storage, file formats, table catalogs, compute, and governance layers to enable workload flexibility while maintaining data integrity.
Fivetran and dbt Labs are enabling organizations to adopt Open Data Infrastructure, an open, governed approach to enterprise data that gives organizations ownership of their data, transformations, and AI stack combined with Snowflake's AI Data Cloud.
Fivetran and dbt Labs are working together to help organizations build Open Data Infrastructure, combining automated data movement with trusted transformations to create governed, AI-ready data foundations.
Fivetran named a Leader for the fifth consecutive year in Snowflake's 2026 Modern Marketing Data Stack report, recognized for its role in helping organizations build AI-ready data foundations through integration and data modeling.
[dbt-fusion] Added --empty support in seed command. Homebrew distribution now available with 'brew install dbt'. GET /api/v1/models now includes catalog field with row_count_stat, bytes_stat, last_modified_stat, and materialized field.
Batch transformations → Streaming transformations with Confluent Cloud Flink
Organizations are shifting from batch-based data transformations to streaming transformations positioned at the source, a pattern called 'shifting left' that enables fresher data delivery, reduces pipeline latency, and simplifies architectural complexity.
Confluent announced the release of a dbt adapter for Confluent Cloud, enabling dbt users to manage Flink SQL transformations with the same familiar dbt interface and CI/CD workflows they use across other data platforms.
A dbt adapter that enables data engineers to define SQL transformations as models, write tests, generate documentation, and deploy through CI/CD for Confluent Cloud Flink SQL. The adapter includes streaming-native materializations (view, streaming_table, streaming_source) and deterministic testing capabilities for streaming pipelines.
Extended data observability coverage into AI observability, providing visibility across all four components where AI systems break (data, system, code, and model) plus AI-specific signals.
Monte Carlo launched Agent Observability to help teams build reliable AI with visibility across data, systems, code, and model components specifically for AI agents.
Monte Carlo interfaces with NASDAQ's fleet of specialized agents for data observability and diagnostics, helping route context to the right agent at the right time.
Monte Carlo participated in a fireside chat at the Databricks Data + AI Summit, where Monte Carlo's leadership discussed AI governance and observability challenges with NASDAQ executives.
Multi-tool fragmented data stack (dlt ingestion, Airflow, hand-maintained transformation layer, semantic models in individual contributors' heads) → Unified agentic platform with canonical modeling toolkit, version-controlled specifications, and AI-generated transformation layers
dltHub is shifting from a fragmented five-tool, five-role data stack to a unified agentic architecture where AI agents generate pipelines, models, and dashboards from semantic specifications, with humans authoring meaning and reviewing implementations.
dltHub Pro offers agentic pipeline generation at approximately $2-3 of agent time per pipeline, representing a new pricing model based on AI-assisted infrastructure automation.
A spec-first semantic modeling system where users author data definitions and meanings, dlt infers schema and types from raw data, and agents generate models from the specifications for reusable agentic retrieval.
dltHub Pro enables users to build pipelines, ingest from sources, transform data, deploy to production, and manage deployments through conversational AI agents, with generated code stored in version-controlled repositories.
Stateless, chat-style agent frameworks → Stateful, event-driven stream processing architecture with Apache Kafka and Apache Flink
Shift from stateless chat-based agents to deterministic stream processing with immutable event logs, seven-dimensional state management, and policy gates for regulated AI compliance.
Tool calling coordination for Streaming Agents through the Model Context Protocol, enabling safe exposure of external APIs and other agents to the reasoning engine.
A managed Kafka engine delivered through Confluent Cloud that provides 99.99% uptime SLA and holds SOC 2, ISO 27001, PCI DSS, and HIPAA compliance attestation.
Managed stream processing with RocksDB state backends for maintaining seven critical states across multi-step agent workflows, supporting exactly-once processing semantics and multi-phase commit sink functions.
A structured event stream that logs every step of agent workflow with reason codes, evidence references, and rule citations using tamper-evident cryptographic chains with SHA-256 hashing and digital signatures.
Native agents running as Flink jobs within Confluent Cloud that automate business processes with AI, featuring stateful workflow management and policy gate enforcement for regulated environments.
A real-time, context-aware AI engine that runs Streaming Agents directly as Flink jobs, with tool calling coordinated through the Model Context Protocol (MCP) and agent-to-agent coordination using the emerging A2A protocol.
Native integration enabling Monte Carlo to observe agents built on Databricks Agent Bricks platform by reading traces directly from Unity Catalog Delta tables through existing Databricks connections. Supports both Knowledge Assistant agents and custom agents built via Mosaic AI Agent Framework.
Native observability support for agents built on Agent Bricks that reads MLflow trace data directly from Unity Catalog Delta tables without requiring SDK installation, pipeline configuration, or deployment. Provides span-level traces, conversation history, eval monitors, and incident management across the agent stack.
Add a direct_parents attribute to model nodes carrying the nearest public ancestors only, emitted in dbt ls --output=json for models. Lineage consumers can now render DAG edges from direct_parents instead of depends_on.nodes.
Cloud-native engine powering Confluent Cloud that decouples compute from storage to deliver GBps+ throughput, 10x faster autoscaling, 10x lower tail latencies, and 99.99% SLA with full Apache Kafka protocol compatibility.
Extends Kafka topics into open table formats (Apache Iceberg and Delta Lake) to form bronze and silver layers of an analytics medallion stack with integrated governance.
A fully managed service that serves structured context to AI apps and agents over the Model Context Protocol with built-in authentication, RBAC, and audit logging.
Agents that run as Flink jobs inside the stream processing pipeline with always-on state, tool calling via MCP and Agent2Agent (A2A), and replayable, governed event flows.
AI-native layer for data streaming platform that ships Streaming Agents, Real-Time Context Engine, and built-in ML functions for agentic AI applications.
Agent within the Monte Carlo platform that diagnoses agentic issues, helping troubleshoot problems with AI agents and their underlying data in production.
Agent Observability capability for monitoring and troubleshooting AI agents in production, including output quality, latency, token usage, tool call accuracy, and trajectory metrics.
Manual Kafka connector management and JAR file installation → Pre-built fully managed cloud connectors
Shift from manual sourcing, installation, and management of Kafka Connect plugins to 120+ pre-built, fully managed connectors integrated into Confluent Cloud.
Kafka Streams with custom Java/Scala microservices → Managed Apache Flink and ksqlDB
Transition from building and running custom Kafka Streams microservices to using fully managed stream processing engines that support SQL-based transformations.
MirrorMaker 2 for multi-region replication → Confluent Cluster Linking
Replacement of independent MirrorMaker 2 cluster deployments with native Cluster Linking that mirrors topics and preserves message offsets across regions.
Shift from managing underlying Kafka broker instances, manual upgrades, and infrastructure provisioning to fully abstracted cloud infrastructure with serverless and dedicated deployment models.
Cost-effective serverless cluster tailored for high-throughput, latency-insensitive workloads like logging and AI/ML data ingestion, offering up to 90% throughput savings.
Confluent replaced external MirrorMaker 2 deployments with native Cluster Linking for multi-region disaster recovery, eliminating the need to deploy and monitor an independent cluster.
Confluent Cloud offers consumption-based pricing for lighter workloads (Basic and Standard tiers) and dedicated capacity model based on Confluent Capacity Units (CKUs) for heavy production environments.
A highly cost-effective, serverless cluster type tailored for high-throughput, latency-insensitive workloads like logging, observability, batch pipelines, and AI/ML data ingestion, offering up to 90% throughput savings.
Enforces strict data contracts using Avro, Protobuf, and JSON Schema to prevent producers from breaking downstream applications with arbitrary payload changes.
A self-managed, comprehensive data streaming platform built on Apache Kafka that includes enterprise-grade features, management tools, and ecosystem integrations.
A fully managed, cloud-native Kafka service available globally across AWS, GCP, and Microsoft Azure with multiple cluster types (Basic, Standard, Dedicated, Enterprise, Freight) and consumption-based or capacity-based pricing models.
Integration and positioning around Snowflake's agentic enterprise vision, including Snowflake CoCo and Datastream announcements discussed in the context of Estuary's real-time data infrastructure.
Specialized Airflow knowledge integration for AI coding tools like Claude Code, Cursor, and VS Code, enabling access to Airflow intelligence in local workflows.
Enterprise orchestration and scheduling platform in your environment with control plane and data plane separation, enhanced security, and Apache Airflow 3 support.
Production-ready and generally available API for programmatically managing Astro at scale, providing stable foundation for automation and migration from beta API.
Enterprise resilience feature for cross-region disaster recovery on Astro with database replication, warm standby compute, and one-click failover, now generally available.
Self-service DAG authoring feature enabling anyone in an organization to create Airflow pipelines through a drag-and-drop no-code interface without requiring Python or Airflow knowledge.
Data engineering agent built for Airflow that investigates pipeline failures automatically and uses AI to convert Control-M, AutoSys, and Automic job definitions into production-ready Airflow DAGs.
Warehouse-centric architecture → Open Data Infrastructure (ODI) with decoupled storage and compute
Shift from tightly coupled, warehouse-centric models to Open Data Infrastructure grounded in open standards, separating storage and compute, supporting multiple engines without data duplication, and enabling AI agents with consistent governance.
Integration enabling Claude AI agents to leverage Monte Carlo's data observability capabilities for building and maintaining reliable data products with embedded quality assurance.
A capability allowing monitoring definitions to be version-controlled and deployed through CI/CD pipelines alongside data assets, ensuring reliability ships with code changes.
A Model Context Protocol server integration with the Agent Toolkit that enables agents to assess monitoring coverage, blast radius, validate fields/tables against live workspace, and emit deployable monitors-as-code YAML.
A safety-first loop that uses editor hooks to surface downstream blast radius, active alerts, monitor coverage before edits, generate monitors-as-code after edits, and validate changes before merge.
Traditional monolithic knowledge work tools → FastMCP with composable, version-controlled Python workflows
FastMCP enables transformation from traditional knowledge work tools like PowerPoint to composable, version-controlled Python workflows for data engineers.
Seven.One Entertainment uses Prefect to orchestrate data pipelines connecting Snowflake, dbt, and AWS across extraction, transformation, quality testing, and distribution.
Seven.One Entertainment uses Prefect to orchestrate data pipelines connecting Snowflake, dbt, and AWS across extraction, transformation, quality testing, and distribution.
Seven.One Entertainment uses Prefect to orchestrate data pipelines connecting Snowflake, dbt, and AWS across extraction, transformation, quality testing, and distribution.
Real-time event-driven workflow capability using Prefect and Debezium Change Data Capture (CDC) for modernizing legacy systems with instant automated workflows.
Prefect - Decomposed Durability for Data Workflows
Prefect's approach to durable execution that decouples results from workflow identity, enabling cross-workflow caching and exactly-once semantics through composable primitives.
Patented two-component hybrid architecture for Prefect Cloud that isolates code and data to meet FedRAMP, HIPAA, and PCI-DSS compliance requirements with three deployment options: Hybrid, PrivateLink, and Customer-Managed.
An agentic security questionnaire workflow feature on Prefect Cloud with full observability, a self-improving knowledge base, and human review built in.
FastMCP replaces traditional knowledge work tools with composable, version-controlled workflows for data engineers, enabling transformation from PowerPoint to Python-based workflows.
Anthropic Fable 5 benchmarking results show impressive document understanding with 90.02% content faithfulness and 72.62% semantic formatting, leading competitors by 12+ points in key metrics on ParseBench.
Open-source project tackling enterprise document processing with four primitives (Parse, Classify, Split, Extract) in a drag-and-drop interface powered by LlamaAgents workflows.
New word, line, and cell-level coordinates for every extracted value, providing complete audit trails from extracted data back to exact source locations in documents for compliance and verification workflows.
Fragmented ERP systems across 19 countries → SAP S/4HANA Cloud private edition via RISE with centralized data in Snowflake
ANASAC consolidated its fragmented multi-country ERP environment onto a unified SAP S/4HANA RISE platform and moved SAP data into Snowflake for centralized analytics and governance.
ANASAC used Fivetran to extract critical data from SAP S/4HANA and deliver it into Snowflake, creating a scalable foundation for centralized analytics.
Evolve Decision Science acted as a strategic partner to ANASAC, identifying Fivetran as the right platform for its unified data strategy and helping drive the implementation.
Vendor-locked, tightly bundled data platforms with proprietary formats and coupled compute/storage → Open Data Infrastructure with open formats, table formats, modular components, and interoperable engines
Fivetran advocates for a shift from vendor lock-in to Open Data Infrastructure (ODI), an architectural approach that stores data once in open formats and enables use across multiple tools, compute engines, and AI systems without vendor dependency.
Single deployment model → Three deployment models: Cloud, Cloud with Customer-hosted Data Store, Hybrid
Monte Carlo introduced flexible deployment options allowing customers to choose where the agent and data store live, from fully hosted to hybrid environments with maximum customer control.
Monolithic native integrations → Composable integration building blocks framework
Integration layer transformed from individual native connectors into composable building blocks (native connectors, Push API, Custom SQL) with per-capability configuration and AI extensibility.
Cloud-native agents only → Multi-deployment model with Generic Agent
Monte Carlo evolved from cloud-native only agents (AWS, Azure, GCP) to support a containerized Generic Agent enabling on-prem, hybrid, and multi-cloud deployments with egress-only architecture.
Isolated platform-layer deployments offering each customer their own subdomain, isolated AWS account, and core infrastructure with optional multi-region disaster recovery and PrivateLink connectivity.
A declarative framework for configuring authentication across self-hosted credential integrations, allowing new auth types to be added as configuration changes running entirely inside the customer's environment.
REST endpoints for pushing metadata, lineage, and query logs into Monte Carlo to fill coverage gaps in existing integrations or land new sources without a dedicated connector.
AI-assisted connectors for SQL sources not in Monte Carlo's native catalog, generated through Claude conversations to produce scaffolding, SQL templates, and deployable images that run on the Generic Agent.
A new integration framework with composable building blocks including native connectors, Push Ingest API, and Custom SQL Connectors that can be mixed and matched per capability and extended via AI workflows with Claude.
A containerized, egress-only agent extending Monte Carlo into on-premises, multi-cloud, and hybrid environments where cloud-native agents cannot reach. Supports Docker deployment with no inbound ports and reads source credentials from AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, or environment variables.
GraphRAG approach requiring extraction of knowledge graphs from unstructured text → Virtual knowledge graph from existing canonical data models
Instead of extracting knowledge graphs from raw text, leverage the structure already present in canonical data models as native knowledge graphs, eliminating a separate extraction step while still applying ontology-driven querying principles.
Sequential query processing through multiple handoffs (request → load → transform → analyst → dashboard) → Question-driven direct model updates with shifted canonical model
Moving the canonical model development left in the pipeline so new questions directly drive ingestion and modeling changes, eliminating handoffs and allowing direct query answering from the model when coverage exists.
Traditional data modeling with semantic layers added post-hoc → Spec-first canonical knowledge layer driving both data modeling and agent queries
Shift from building raw data models first and then adding semantic layers on top, to writing canonical knowledge layers (combining structure, taxonomy, and ontology) first and using them to generate both the data model and drive agent-based querying. This eliminates the need to maintain separate artifacts.
Educational offering available on dltHub's learning platform covering hands-on implementation of ontology-driven data modeling and spec-first development approaches.
A toolkit for building and managing canonical knowledge layers (taxonomies and ontologies) that define data meaning alongside structure for use in text-to-SQL and agentic data engineering.
Data-engineering agents that generate ingestion and modeling from a canonical knowledge layer specification. The system allows developers to bootstrap an ontology and use it to generate canonical models with human curation of judgment calls.
Making data dependencies explicit and part of the platform's operational model rather than encoding them in code paths, logs, and engineers' knowledge, enabling impact analysis and safer re-runs.
Transitioning from time-based scheduling (e.g., 'run at 7:00 AM') to declarative automation that responds to data state, dependency status, missing partitions, and freshness expectations, eliminating hidden ordering assumptions.
Job-centric orchestration systems → Asset-aware orchestration (Dagster)
Moving from job-centric systems that track task execution status to asset-aware systems that model data as first-class objects with explicit dependencies, quality checks, and freshness SLAs. This architectural shift enables teams to reason about data state rather than just job completion.
New feature enabling line, word, and cell-level bounding box tracking across documents for precise citation attribution and redaction in agentic document AI workflows. Available in beta across all paid tiers.
Coalesce MCPs provide Claude and other AI clients direct access to transformations, governance metadata, and quality monitors for integrated data engineering workflows.
MCP capability enabling teams to set read vs. write token scopes, chain Coalesce MCPs with Slack and GitHub, and add per-call approval gates for controlled governance.
MCP capability for debugging anomalies with live root-cause analysis by tracing anomalies upstream, reviewing recent commits, and pinpointing exact changes that broke pipelines.
MCP capability enabling users to run impact analysis before field changes by pulling Catalog metadata and checking Quality status in a single prompt for downstream picture visibility.
AI-enabled feature giving Claude and other AI clients direct access to Coalesce transformations, governance metadata, and quality monitors for unified impact analysis, root-cause debugging, and owner assignment within a single conversation.
Hevo Data - Webhook Integration for Real-time Updates
Real-time data ingestion capability that allows Facebook to push updates directly to Hevo whenever campaigns, ads, or metrics change, enabling near-real-time data streams to BigQuery.
No-code data pipeline that automatically syncs Facebook Ads data to BigQuery with built-in schema handling, incremental syncs, API change management, and automatic retries.
AI Workbench is a tool that builds CDM (Conceptual Data Models) by asking clarifying questions about a domain and capturing answers as structured business rules, generating ontologies from the modeling workflow.
Coalesce offers AI capabilities to accelerate data engineering workflows, including natural language prompts for data transformation and modeling, AI-generated production-ready pipelines, and rich column-level metadata support for Snowflake.
Multiple disparate tools for transformations, cataloguing, and observability → Unified Coalesce data operating layer with integrated Transform, Catalog, and Quality
Moving from a fragmented tooling approach with separate tools for transformations, cataloguing, and observability to an integrated data operating layer with built-in observability capabilities alongside Transform and Catalog.
A built-in observability capability integrated into the Coalesce data operating layer alongside Transform and Catalog. It enables teams to automatically monitor critical data assets, detect unexpected issues in real time, and investigate incidents with full lineage and metadata context.
Alliant Insurance migrated 200 data pipelines from SQL Server/SSIS to Snowflake and Coalesce, completing the legacy modernization in five weeks with 55-60% cost reductions.
Systech's DBShift automation platform was used to facilitate Alliant Insurance's migration of 200 SQL Server/SSIS pipelines to Snowflake + Coalesce, demonstrating integration between Coalesce and Systech's migration tooling.
Migration scenario demonstrating transition from legacy on-premise SQL Server systems to cloud-native Snowflake architecture, including automated ingestion with Snowflake OpenFlow and transformation via Coalesce.
Joint demo walkthrough event showcasing SQL Server to Snowflake migration using Coalesce for transformation and modelling, with Snowflake OpenFlow for automated ingestion and downstream workflow orchestration.
AI agent capability that analyzes pipeline failures and performs triage by reasoning across schemas, queries, and job metadata to surface root causes and propose actionable fixes directly inside the data environment.
Snowflake integration via Model Context Protocol enabling Claude Code to run targeted investigative SQL queries and trace data lineage for incident diagnosis.
Claude Code integrated with Monte Carlo's platform via Model Context Protocol (MCP) to enable AI-assisted root cause analysis, source code reading, and SQL query execution against Snowflake for data incident investigation.
Integration enabling Claude Code to read dbt model source code and run targeted investigative SQL queries connected to Snowflake via Model Context Protocol for advanced root cause analysis.
An agentic observability tool that automates fast triage of data incidents by pattern-matching against alert history, table metadata, and lineage to identify root causes in seconds rather than 15-30 minutes of manual analyst work.
dlt supports using Pydantic BaseModel classes as authoritative schema contracts for tables, enabling code review of contract changes and automatic validation of field names and types.
dlt automatically creates variant columns (e.g., amount__v_text) alongside original columns when data types change, preserving both the original and new type without breaking existing queries.
dlt provides schema_update payload that can be routed to Slack, PagerDuty, Linear, or other channels as part of the pipeline run, enabling real-time notifications when schema changes occur.
dlt supports schema evolution policies with four contract modes (evolve, freeze, discard_row, discard_value) applied at the resource level, enabling granular control over schema changes across different data pipeline layers (raw, silver, gold).
Manual, siloed data analysis across disconnected systems (Gong, Salesforce, Zendesk, billing, product usage) → Centralized data warehouse in BigQuery with AI-powered agent for automated churn analysis
Fivetran enabled transformation from manual multi-system churn analysis to centralized data architecture with AI agents, moving from reactive post-churn reviews to proactive retention management at scale.
Fivetran built an AI agent using Claude to analyze unified customer data and identify churn risks, demonstrating integration of Anthropic's LLM for enterprise retention use cases.
Fivetran centralized Salesforce interaction history and QBR notes as part of the unified customer data foundation used for churn pattern analysis and AI-powered retention management.
Fivetran integrated Gong call transcript data as a primary signal for churn analysis, incorporating 12 months of call transcripts into the centralized BigQuery data foundation for AI-driven retention insights.
Fivetran utilized Zendesk support ticket data as a key signal for churn analysis, combining it with data from other sources to build a unified customer data foundation for AI-powered retention analysis.
Now generally available through Fivetran's Partner SDK destination, enabling direct data loading into MotherDuck built and maintained by the MotherDuck team.
Now available as open-source foundation under Apache 2.0 license, bringing the dbt Fusion engine runtime to the broader community with faster, more scalable execution engine.
AI coding agent purpose-built for analytics engineering, now available to all dbt Core users, leveraging dbt metadata to help build, troubleshoot, and optimize dbt workflows.
Now available for all dbt users (Core and Fusion), this feature caches model and source state to reduce warehouse spend by average of 30% by skipping or cloning unchanged models.
Open-source standard designating a schema in data warehouse as shared context layer for AI agents, providing flexible interoperable foundation while avoiding vendor lock-in.
New feature that locks column data types after initial sync to prevent automatic data type changes during future syncs, reducing unexpected downstream impacts.
New truncate operation support enables easier replication of sources that only return active records, with transaction-style sync model ensuring all changes fully commit or roll back together.
Now in Beta, this feature enables users to create custom connectors in minutes without coding by providing REST API documentation, automatically generating and configuring data pipelines.
A two-layer architecture combining orchestration (routing, context injection, guardrails) with a trust layer that continuously monitors agent performance through span-level telemetry, anomaly detection, and business-grounded evaluation pipelines.
Monte Carlo launched Agent Observability to help teams build reliable AI agents in production by monitoring context retrieval, efficiency, intended behavior, and output fitness for purpose.
dltHub named 2026 Snowflake Startup Program Product Partner of the Year at Snowflake Summit 2026, recognized for helping over 1,000 organizations ingest data into Snowflake AI Data Cloud with Python-native pipelines.
Agentic data engineering platform built on dlt that enables developers working with AI coding agents (Claude Code, Cursor, Codex) to find a source, build the pipeline, validate it locally, and deploy to production in one command.
Snowflake Native App for MSSQL, Oracle, MySQL, and PostgreSQL replication
Shipped a Snowflake Native App enabling full pipeline execution inside customer's Snowflake account for replicating MSSQL, Oracle, MySQL, and PostgreSQL databases with no external orchestrator required.
Federated data access across multiple systems → Centralized data warehouse with real-time updates to Snowflake
Organizations are consolidating data into Snowflake as a centralized AI Data Cloud rather than accessing multiple federated source systems, enabling up-to-date information for AI workloads.
Estuary Build enables fully managed real-time data pipelines with deployment options including Public, Private, and BYOC, featuring less than 100ms latency on streaming sinks/sources.
Salesforce introduced Headless 360 in April 2026, exposing its platform's core capabilities through API, MCP (Model Context Protocol) and CLI for agent access.
Snowflake introduced semantic views to store data definitions and descriptions, which support text-to-SQL generation and Cortex Analyst for accurate query interpretation.
Parse-Flow implements document processing pipelines using the llama-agents event-driven workflow framework, where document intelligence operations are orchestrated through a state machine composed of bootstrap, worker, and router steps.
Open-source visual document intelligence workflow designer that enables users to drag-and-drop document processing steps (parse, extract, classify, split) onto a canvas and monitor execution with a live event dashboard.
Fivetran expanded its deployment availability to Google Cloud Platform in Saudi Arabia (GCP Dammam region), enabling in-region data processing and storage for organizations requiring data sovereignty compliance.
Fivetran on Google Cloud Platform (GCP Dammam region)
Fivetran is now available on Google Cloud Platform in Saudi Arabia (GCP Dammam region), enabling organizations to keep data in-region while maintaining compliance with the Kingdom's Personal Data Protection Law (PDPL) and accelerating AI adoption.
Open-source workflow designer for visual document intelligence workflows with parsing, extraction, classification, and splitting primitives on a visual canvas backed by async worker and live event dashboard.
Fivetran and dbt Labs announced a merger on June 1, 2026, integrating Fivetran's data movement capabilities with dbt's transformation and governance layer to support Open Data Infrastructure architecture with reliable data movement, model contracts, and semantic layer definitions.
dbt-state as separate plugin → dbt-state bundled with dbt-core
Bundle dbt-state plugin (>=2.18,<3.0) as an optional install dependency of dbt-core, opt-in via --manage-state flag, DBT_ENGINE_MANAGE_STATE env var, or manage_state in dbt_project.yml.
Add --use-v2-parser flag to delegate parsing to the fusion parser, load its manifest.json into runtime Manifest, and bypass dbt-core's parser. Configurable via CLI or dbt_project.yml.
Make MAXIMUM_SEED_SIZE_MIB configurable. Automatically create latest-version pointer for versioned models. Add support for private git packages in packages.yml and dependencies.yml.
Kafka Streams - Local State Cleanup on Startup (KIP-1259)
Adds state.cleanup.dir.max.age.ms configuration to automatically delete state directories that have not been modified for the specified duration on startup.
Kafka Broker - OAuth Client Assertion Support (KIP-1258)
Adds support for client assertion authentication to client_credentials grant type with OAuth to enhance security and compatibility with OAuth providers.
Introduces cordoned.log.dirs configuration to cordon log directories, preventing new partitions from being placed on cordoned directories for scaling and decommissioning operations.
Major release containing 25 KIPs and over 600 commits with new features including Share Group Controls, Broker Isolation metrics, OAuth Client Assertion support, Headers-Aware State Stores in Kafka Streams, and improved group coordinator assignment logic.
Schema-aware data quality checks that bootstrap from dlt's existing schema, sample columns before rules ship, write checks as decorators into pipelines, and route failures to appropriate toolkits (ingestion, transformations, exploration). Includes four primitives (is_unique, is_not_null, is_in, case) and column-level metrics for drift detection.
dltHub shifted from per-row billing to compute hour-based billing, arguing that users should pay for compute time used rather than the number of rows moved (which varies by data format and source type).
Brooklyn Data, a data consulting firm, deployed Dagster Compass on top of Snowflake to enable self-service analytics for their delivery excellence organization, using it to expose modeled PSA (Professional Services Automation) data and improve operational efficiency in Slack.
Separate dual-engine architecture (dbt Core and dbt Fusion) → Unified single-engine architecture
dbt Core v2.0 and Fusion are now built on a shared foundation, ending the two-engine era. The previously separate ELv2-licensed Fusion code is now open-sourced as Apache 2.0 licensed dbt Core v2.0.
dbt Core v2.0 introduces Parquet as a high-performance alternative to large JSON files for artifacts, enabling direct querying through DuckDB and other agents.
dbt Core v2.0 is now built on the same Rust-based foundations as the dbt Fusion engine, replacing the previous Python implementation as the baseline for all users. The high-performance Rust implementation reduces complexity and becomes the foundation for future innovation.
Fivetran and dbt Labs announced a strategic combination with joint leadership fireside chat on June 25, 2026. dbt Core v2.0 is built on shared foundations between Fivetran + dbt Labs.
New beta version of dbt Core v1.12.0 released, which enforces behavior changes that will be fully removed in v2.0 and ships with Fusion-powered project parser.
Major release of dbt Core rebuilt on the same foundations as dbt Fusion engine with significant parse time improvements, a tightly-defined language spec, new Parquet artifacts for high-performance storage, revamped local documentation experience, streamlined adapter building, and simplified installation process.
Major architectural shift to open source the Fusion runtime and move to a Rust-based engine in dbt Core v2.0, enabling up to 10x faster parse times and better scalability while aligning commercial investment directly with open source improvements.
Joint product innovations between Fivetran and dbt Labs to build open, interoperable, AI-ready data infrastructure including dbt Core v2.0, dbt State, dbt Wizard, and integrations enabling coordinated product strategy.
Open standard context layer stored in plain SQL tables that allows agents to read metric definitions, semantic models, dbt lineage, and custom business documentation from warehouse or lake schemas.
Beta tool that generates Fivetran-managed connectors for API sources directly from API documentation in minutes by crawling, parsing, and validating API structures.
Purpose-built AI agent for analytics engineering that understands dbt projects natively, helping with investigations, code changes, testing, and validation while maintaining project context and contracts.
Plug-in that brings state awareness, orchestration, and caching across dbt Core and the dbt platform, reducing dbt-generated compute by over 30% while checking warehouse metadata and model SQL for changes.
Extension to dbt Core v2.0 with richer capabilities including SQL comprehension, column-level lineage, instant feedback, and high-performance SQL linting.
Open sourcing the Fusion runtime under Apache 2.0 license with a Rust-based engine offering up to 10x faster parse times, better scalability, cleaner adapter contribution model, and modern docs experience.
Proprietary data warehouses → Open Data Infrastructure with Apache Iceberg and Apache Polaris
Movement toward open, AI-ready data foundations using Apache Iceberg and Fivetran-hosted Apache Polaris catalog to provide flexibility, interoperability, and vendor neutrality for multiple compute engines.
Traditional static dashboards and manual analyst queries → Agentic AI with natural language interfaces powered by Fivetran, dbt, and Google Cloud
Shift from static dashboards to agentic AI that enables business users to ask complex questions in natural language and receive actionable answers, transforming how teams interact with data.
Collaboration demonstrating how dbt standardizes and models data into consistent, reusable business semantics that AI agents can understand and trust within the Fivetran and Google Cloud ecosystem.
Partnership showcasing how Fivetran, dbt, and Google Cloud create a data foundation for powering AI agents with fresh, traceable, and well-governed data.
Single LLM-as-judge scoring system with monthly dashboard reviews → Multi-dimensional time-series evaluation framework with automatic anomaly detection via OpenTelemetry collectors and Monte Carlo orchestration
Axios shifted from a reactive, single-score evaluation approach to a proactive, multi-dimensional observability stack that tracks task completion, helpfulness, groundedness, and accuracy with continuous anomaly detection.
Design partnership between Monte Carlo and Axios (media company) in early 2025 to advance their observability maturity and build agent observability infrastructure for AI auto-tagging systems.
Integration of OpenTelemetry collectors with Monte Carlo as the orchestration layer for capturing and monitoring AI agent spans and metrics across production systems.
Monte Carlo's Agent Observability platform provides end-to-end visibility across context, performance, behavior, and outputs for AI agents in production, with multi-dimensional evaluation tracking and anomaly detection capabilities.
Partnership to create a data quality cost calculator and conduct research on the Total Economic Impact of Monte Carlo's Data + AI Observability Platform, as well as surveys on data professional incident resolution times.
Siloed, fragmented customer data across multiple systems → Centralized streaming platform with cross-system signal correlation and semantic embeddings
Customer signals are now ingested from multiple sources into Kafka topics, normalized and enriched with Flink, vectorized for AI, and served through a unified intelligence layer with account-level authorization.
Batch-driven and static dashboard-based customer intelligence → Real-time streaming-first architecture using Apache Kafka, Apache Flink, and vectorized AI grounding
Confluent shifted from traditional batch and static reporting approaches to a real-time streaming architecture for customer intelligence, using Kafka for durable event ingestion, Flink for stream processing and enrichment, and vector search with LLM grounding for AI insights.
An integrated capability within CIH that automates the generation of executive account summaries for key meetings, quarterly business reviews, and internal reviews.
An alternative Account Center layout in CIH that identifies expansion opportunities by showing data streaming platform (DSP) adoption and pipeline by product, helping sellers and customer success teams understand where customers are not yet using DSP capabilities.
A generative AI capability within CIH that allows users to ask natural-language questions about customer data and receive contextual summaries grounded in CIH data, such as engagement summaries, support issue analysis, and usage growth patterns.
An internal Confluent application providing a single prioritized view of customer accounts with change detection, AI-driven insights, and operational workflows. It centralizes customer signals from multiple systems (Salesforce, Zendesk, product telemetry, billing, Jira, CommonRoom, Quartr, Crossbeam) and surfaces real-time activity feeds, priority signals, and generative AI capabilities (AccountIQ) for GTM teams.
Manual orchestration of Snowflake Dynamic Tables → Virtual assets with Dagster freshness sensors
Shift from manual handling of Snowflake Dynamic Tables as black-box objects to modeling them as virtual assets in Dagster's asset graph with automated freshness monitoring and downstream triggering.
Dagster integrates with Snowflake to provide orchestration, lineage, automation, and cost visibility across data platforms. The partnership enables native SQL-defined assets and cost attribution through Dagster+ Insights.
A feature in Dagster+ Insights that attributes Snowflake query costs directly to assets that incurred them, providing cost visibility across the entire Snowflake footprint including direct assets and dbt models.
A sensor pattern that monitors Snowflake Dynamic Tables' last_completed_refresh and triggers downstream asset runs only when new data has actually landed, preventing stale reads.
Virtual assets feature that integrates Snowflake Dynamic Tables into Dagster's asset graph with is_virtual=True, allowing Dagster to track and orchestrate downstream dependencies without managing the actual materialization.
A new component that allows users to define Dagster assets directly from SQL files using simple YAML configuration, enabling teams to work natively in SQL without requiring Python for Snowflake pipelines.
Integration requiring zero additional SDKs that reads Snowflake's native system tables directly for agent monitoring without data leaving the customer account.
Four-layer Trust Framework that monitors agent outputs, behavior, performance metrics like latency and token utilization, and the data context feeding agents in production.
Node/Typescript implementation → Rust core with language bindings (Node, Python, WASM)
Complete rewrite of LiteParse from Typescript to Rust to enable cross-platform deployment across multiple languages and runtimes including browser and edge environments, while eliminating hard Node.js dependency.
Complete rewrite of LiteParse in Rust enabling it to run natively across Rust, Node, Python, and WASM packages. The new version provides 5-100x speedup for small documents and 3x speedup for larger documents, while maintaining LLM-free PDF and document text extraction with layout preservation.
Siloed context across pipeline stack (schema at ingest, joins in transform, lineage in orchestrator, runtime state in warehouse) → Unified context model where business context and data model are produced together from a single Python process and exposed to agents for reasoning
dltHub Transformation closes the context gap by consolidating schema knowledge, join structure, lineage, and runtime state into one agentic-readable model produced end-to-end.
Traditional SQL-based transformation layer with fixed connectors and models that break when schemas change → Pythonic, schema-aware transformation layer (@dlt.hub.transformation decorator) running wherever the developer or agent is
Shift from decade-old transformation tools built for human-written pipelines to a new architecture designed for agent-written pipelines at scale, addressing the explosion from 5% to 91% agent-written pipelines in one year.
dltHub Transformation toolkit is designed to integrate with and run as slash commands within these AI code editors to enable AI-assisted data transformation and modeling.
A step-by-step workflow tool consisting of four slash commands that guide users from raw data through taxonomy, ontology, and canonical data model generation to final transformation code, designed to be used within AI code editors.
A new transformation module entering public preview as part of dltHub Pro that turns raw data into clean business tables using agentic workflows. It includes a Python decorator (@dlt.hub.transformation) and a toolkit with slash commands (/annotate-sources, /create-ontology, /generate-cdm, /create-transformation) that work with Claude, Codex, or Cursor to automatically model data.
Coalesce is showcasing its platform integration on Databricks at the Data + AI Summit 2026, demonstrating AI-assisted pipeline scaffolding, governance, and quality controls built into the Lakehouse workflow.
Local document parser running 100x faster with expanded language support (Typescript, Rust, Python, Node) and Edge support. Zero Python dependencies, processes documents locally for AI agent iteration.
UI-first product development approach → Agent-first product development approach with MCP tools as primary interface
Monte Carlo restructured its development methodology to build for AI agents first and humans second, designing MCP tools and skills before creating human user interfaces. This architectural shift enforces cleaner interfaces, structured inputs/outputs, and eliminates human-in-the-loop assumptions.
New agent-first capabilities including freshness monitoring, volume anomaly detection, schema change tracking, lineage tracing, and incident history access specifically designed for AI agents to verify data reliability before taking action.
Monte Carlo built MCP (Model Context Protocol) tools and skills prioritizing agent-first design, enabling AI agents to check data table health and access institutional memory of data behavior, incidents, and resolution patterns.
An ontology-driven transformations toolkit that reverse-engineers SQL into draft ontologies, consolidates tables into canonical concepts, generates clean transformation layers, and powers Chat-BI capabilities for business users.
A new feature that runs ingestion, transformation, lineage, and verification inside the same execution context using a Python decorator (@dlt.hub.transformation), enabling LLMs to reason about business data with full context and metadata continuity end-to-end.
Static data warehouse-centric Customer 360 systems → Real-time, event-driven AI-native Customer 360 architecture with RAG and guardrailed generation
Traditional warehouse-centric Customer 360 architectures designed for periodic reporting are being replaced by AI-native real-time architectures that combine continuous event streams, unified customer profiles, RAG-based retrieval, and governed AI generation.
Batch-based ETL with nightly updates and data warehouse architectures → Real-time event-driven streaming architecture with continuous event ingestion and live customer profile stores
Organizations are shifting from periodic batch updates (ETL once daily or every few hours) to continuous event-driven architectures that update customer profiles in real-time using streaming platforms like Kafka and stream processing with Flink.
AI-powered agents that automate business processes by combining real-time customer data streams with generative AI and guardrailed response generation.
Real-time, context-aware AI capability enabling AI-powered Customer 360 architectures that combine streaming data with RAG and generative AI for personalized customer experiences.
Estuary enables controlled JSON unnesting during the ingestion phase with field selection modes (Required only, Depth 1, Depth 2, Unlimited depth) that allow teams to define how deeply nested JSON fields are materialized into columns while preserving one row per original record.
Feature that records every CDC event as it happens, creating a full timeline for audit trails that can be replayed for audits, investigations, and compliance checks when paired with Delta Updates on materialization.
Three deployment models offered: Public cloud SaaS, Private deployment, and BYOC (Bring Your Own Cloud) for organizations requiring stricter compliance isolation and data residency controls.
Compliance-ready CDC pipelines with built-in governance controls including RBAC with least-privilege access, prefix-based permission management, TLS 1.2+ encryption in transit, KMS-based encryption at rest, History Mode for audit trails, version-controlled YAML pipeline definitions, and auditable backfill/replay operations.
LlamaParse and LiteParse integrated with Google's new sandboxed Agents API to provide document processing capabilities for autonomous agents in the Google ecosystem.
Demo agent that ingests SEC filings and answers questions with exact citations highlighted on original PDF pages, built with ~600 lines of Next.js without vector database.
Native HEIC file support added, allowing LlamaParse to parse Apple's default HEIC image format directly without conversion to JPEG first, ideal for enterprise file systems with iPhone photos.
Lexical search (grep-based) for agentic retrieval → Hybrid semantic and lexical search with document parsing layer
The article discusses a shift from relying solely on grep-style lexical search to a layered approach that parses unstructured documents into text using LlamaParse or LiteParse, indexes them for semantic search via embeddings, and preserves grep for specific exact-match use cases.
Strategic integration enabling Coalesce Catalog to ingest and govern Omni workbooks, queries, and dashboards alongside warehouse data, connecting the data transformation layer with the BI/analytics consumption layer for end-to-end lineage and governance.
New connector that catalogs Omni AI analytics platform assets (workbooks, queries, dashboards) directly in Coalesce Catalog, enabling centralized discovery, ownership visibility, and full-stack lineage connecting BI content to underlying data transformations and tables.
Evolution from basic database-to-LLM integration to a robust, multi-stage event-driven architecture with governance layers, PII masking, schema enforcement, and tokenization before vector storage.
Batch RAG processing with manual document uploads → Real-time streaming RAG with Change Data Capture (CDC) and event-driven pipelines
Shift from batch RAG systems with periodic manual re-indexing to streaming-based RAG architecture using CDC and Kafka for real-time policy updates and continuous compliance enforcement.
Observability layer that creates immutable audit logs tracking every AI response to its exact source document, enabling forensic review and compliance verification.
Event-driven streaming RAG architecture using Change Data Capture (CDC) and streaming platforms for real-time policy updates and automated document governance without manual re-indexing.
Confluent Stream Governance for Regulated RAG/GenAI
Stream Governance feature enabling reliable, discoverable, and secure data streams with PII filtering, field-level encryption, and RBAC controls for compliant RAG architectures in regulated sectors.
Batch-processed static knowledge bases and point-to-point ETL → Real-time event streaming architecture with Change Data Capture (CDC) and Apache Flink stream processing
Shift from nightly batch jobs and static document uploads to continuous event-driven ingestion with real-time embedding updates and context synchronization for enterprise RAG systems.
Production-grade AI architecture that connects Large Language Models to continuous, real-time streams of proprietary corporate data using event streaming for document ingestion, embedding updates, and context synchronization.
Stateless or Limited Local Agent State → Shared Streaming State Layer with Continuous State Propagation
Architectural shift from hidden state inside individual agents/services to persistent shared state materialized and propagated through streams, enabling consistent multi-agent coordination.
Batch and Scheduled Decision Cycles → Real-Time Sub-Second Autonomous Decision Making
Evolution from batch pipelines, scheduled workflows, and API-based orchestration to event-triggered agents operating continuously on real-time streams with millisecond decision latency.
Reactive Event-Driven Systems → Agentic Event-Driven Systems
Shift from static, predefined reactive workflows to autonomous agentic systems with continuous decisioning, AI-driven reasoning, closed-loop feedback, and runtime adaptability without human intervention.
Open-source local MCP server, managed MCP server, and Agent Skills that give AI coding assistants direct access to the streaming platform with tools to act and domain knowledge to build.
AI-powered agents that automate business processes by continuously sensing events, reasoning over shared state, and taking autonomous actions in real-time through event-driven closed-loop systems.
Rigid templates and manually engineered parsing rules → Configuration-driven, machine learning-based parsing with adaptive generalization
Transition from fixed template systems requiring constant maintenance to configurable extraction logic that generalizes across evolving document ecosystems without requiring constant reconfiguration.
Isolated text extraction pipelines → Intelligent document workflows with reasoning capabilities
Shift from independent extraction steps toward coordinated systems capable of reasoning across multiple document types, applying validation logic, and managing uncertainty through confidence-based workflows.
Traditional OCR systems and character recognition → Layout-aware parsing with structured extraction and cross-document validation
Evolution from standalone text recognition toward integrated document understanding systems that preserve structural relationships, validate data across multiple documents, and support decision-ready outputs for insurance workflows.
Single extraction model applied across entire document package → Classify-extract-validate loop with adaptive model routing
Architectural change from applying one extraction model uniformly to implementing intelligent document classification, per-document-type model selection, and multi-stage validation loops.
Template-based OCR → Agentic OCR with layout-aware computer vision
Shift from static template-matching extraction to agentic reasoning-based document processing that classifies documents, adapts model selection per document type, and validates extracted values against adjacent document data.
Coalesce integrates with Claude's skill system to enable data engineers to build reproducible AI-powered workflows on top of Coalesce Transform, Catalog, and Quality MCPs.
Claude tool that generates initial skill structures from natural language descriptions of data engineering workflows, providing best-practice templates for rapid skill development.
A Claude skill that automatically triages data quality issues by scoring severity, downstream impact, and ownership status, then proposes Linear tickets with actions (create, skip, acknowledge, or flag for tuning).
Reusable recipes (SKILL.md files) that enable Claude to perform data engineering tasks consistently, including weekly data quality reports, issue triage, and root cause analysis using Coalesce MCPs.
Manual transformation code written by data engineers → AI-generated transformations from ontology specifications
Architectural shift toward agentic code generation where Python transformations, SQL queries, and data validation rules are automatically generated from structured ontology definitions rather than hand-written by engineers.
Point-to-point ETL migrations (HubSpot → Attio with bespoke transformations) → Canonical Data Model (CDM) architecture with standardized entity definitions and bounded mapping work
Shift from direct source-to-destination migrations to using a canonical data model as a system-neutral common language. This enables reusable ETL stages (HubSpot → CDM → Attio) and reduces rework for future migrations to different destinations.
dltHub provides migration tooling to extract data from HubSpot CRM and transform it into other systems like Attio, handling complex business logic and compliance requirements.
dltHub enables migration from HubSpot to Attio CRM using agentic transformations. Attio API schema is used as input for transformation generation and the platform serves as the destination for the migration workflow.
Free course teaching AI-native data engineering workflows for data migrations and transformations, documenting the end-to-end methodology used in the HubSpot-to-Attio migration case study.
REST API extraction toolkit that automatically scaffolds authentication, pagination, schema inference, and incremental loading. Integrates with MCP server (10,000+ configs) to pull API context and reuse existing pipeline configurations.
AI workbench for building ontologies, generating transformations, and managing data migrations with agentic transformations. Includes transformation toolkit that generates Python code from ontology definitions, handles schema mapping, GDPR filtering, and generates mock data for testing.
Dagster migrated its entire Python monorepo from Pyright to Astral's new type checker 'ty', reducing OSS type-checking CI time from ~15 minutes to 1-2 minutes while improving bug detection capabilities.
Dagster adopted Astral's new Python type checker 'ty' for performance improvements in their CI pipeline, achieving 10x faster type checking and discovering real runtime bugs that Pyright missed.
Standard uniform OCR pipeline processing entire passport image identically → Agentic OCR with layout-aware zone segmentation, model routing per document element, and checksum validation
Shift from flat character extraction to zone-aware document processing that segments passports into MRZ, VIZ, photo, and hologram regions, routing each to appropriate models with checksum validation and cross-zone reconciliation.
Standard OCR (pixel-to-text conversion) → Agentic OCR (layout-aware computer vision with field-level validation)
Shift from traditional left-to-right text-stream OCR to layout-aware extraction that identifies discrete bounded fields, understands document structure before extraction, validates against coding formats, and provides field-level confidence scores for medical claims documents.
LlamaParse shifts from rigid template-matching and character-level OCR to agentic parsing that understands document structure, tables, charts, and layouts differently based on content type.
Dispersed, specialist-dependent knowledge management → Centralized, queryable knowledge base via unified data lakehouse
Fivetran unified institutional knowledge from multiple sources (Zendesk, Slab, Jira, GitHub, Google Drive, Gong, Salesforce, public docs) into a single data lakehouse, transformed with dbt, and made queryable via AI.
Manual, ticket-by-ticket support model → AI-augmented, efficiency-first support engine
Fivetran shifted from a purely human-operated support model to an AI-augmented system where humans act as architects, governors, and escalation experts while AI handles ticket drafting, summarization, and analysis.
A custom Zendesk plugin that embeds AI directly into the support ticket workflow, featuring Ask AI for conversational answers, Respond with AI for draft responses, Generate Summary for ticket handovers, and Find Similar Tickets for surfacing historical resolutions.
MAR (Monthly Active Rows) billing model → Volume-based (per GB) billing model
Estuary moves away from Fivetran's MAR pricing approach, adopting a simpler volume-based model that charges per GB of data moved rather than counting row changes and activities.
Fivetran 2026 pricing update: inserts, updates, and deletes all count toward paid MAR (previously only inserts and updates counted); includes minimum per-connection fees; multiple updates within same month in history mode now count toward MAR.
Fivetran 2025 pricing update changed from account-wide volume discounts to per-connector MAR calculation, removing shared usage discounts and complicating forecasting for multi-connector setups.
Estuary introduces volume-based pricing model charging $0.50 per GB of data moved (both ingestion and materialization) plus connector fees ($100 for first 6, $50 for additional).
Estuary offers volume-based pricing at $0.50 per GB of data moved for ingestion and materialization, plus $100 per connector for the first six and $50 for additional connectors, with a free tier including 10 GB/month and up to 2 connector instances.
Proprietary cloud data warehouse storage with vendor-locked formats → Open Data Infrastructure with open file formats (Apache Iceberg, Delta Lake) in managed data lakes with commodity cloud storage
Shift from centralized data within proprietary cloud data warehouse storage to Open Data Infrastructure that separates storage from compute, enabling data to be stored in open formats within managed data lakes on commodity cloud storage, while CDW becomes a compute engine rather than the storage layer.
Laptop-local data processing with manual pipeline uploads → Laptop-local development with managed cloud deployment via one-command deployment
Architectural shift enabling developers to build pipelines on local DuckDB instances and seamlessly deploy to production cloud data warehouses (Redshift, Snowflake) with integrated observability.
Siloed data engineering tools (separate ingestion, transformation, orchestration platforms) → Unified LLM-native data engineering platform with shared context layer
Platform architecture consolidates ingestion, transformation, and deployment into unified platform with agent-readable context layer that flows across all workflows, replacing fragmented tool stacks.
Traditional SaaS ETL platforms → AI-native dltHub Pro with agentic pipelines and managed dlt runtime
Shift from third-party SaaS ETL tools to custom-owned dlt pipelines orchestrated by AI agents and deployed on dltHub's managed infrastructure, enabling smaller teams to own end-to-end data stacks.
Integration enabling dltHub pipelines to deliver data directly into Snowflake for financial institutions to transform raw data into governed analytics and AI-ready datasets.
dltHub Pro subscription tier launched at $119 USD per month, including 50 USD in monthly credits for managed infrastructure runtime, with usage billed at $1 USD/hour beyond included credits.
Mid-size company offering extending Pro with richer context layer including AI-native data catalog, ontologies, lineage, LLM wikis, multi-team collaboration, and operational agents for validation and pipeline health monitoring.
Transformation toolkit module enabling data scientists and analysts to validate, transform, and perform semantic modeling on dlt pipelines with ontology-based skills.
Claude/Codex/Cursor-native data engineering platform that deploys, monitors, and scales dlt pipelines. Includes AI Workbench, secrets management, local DuckDB workspace, OTEL telemetry, build agents for pipeline building and exploration, and managed runtime with observability and scheduling.
Advanced citation matching system with layered search strategies (LiteParse searchItems, whitespace-flexible regex, currency/symbol stripping, alphanumeric matching) that locates cited text on pages and renders visual highlight overlays.
Feature enabling direct integration with SEC's EDGAR database to fetch and parse recent corporate filings by ticker symbol, with automatic HTML-to-PDF conversion support.
AI agent demo built with LiteParse that ingests SEC filings, searches across documents, and answers financial questions with precise citations and visual highlighting of source text on original PDF pages.
Traditional reactive batch-driven fleet management systems → Real-time event-driven agentic architecture with autonomous agents and closed-loop feedback loops
Shift from reactive, manual, batch-processed fleet systems to proactive, autonomous, real-time agentic systems that continuously optimize operations using streaming data, ML inference, and decentralized agent coordination.
Confluent's AI developer tools (MCP Server & Agent Skills)
Open-source local MCP server, managed MCP server, and Agent Skills that provide AI coding assistants direct access to the streaming platform with tools to act on data and domain knowledge for building agentic systems.
AI agents that automate business processes with autonomous decision-making capabilities for fleet management, routing optimization, maintenance prediction, and dispatch operations through real-time event-driven architectures.
Scans projects, extracts schemas from data models, tags PII fields, and generates Terraform to register schemas in Schema Registry for proper governance.
Domain-specific AI skill modules that package Confluent expertise for platforms like Claude Code, Cursor, and Windsurf. Includes four GA skills: Schema Registry, Kafka Streams, Python Kafka Client, and CDC to Tableflow.
A read-only MCP server hosted directly in Confluent Cloud with zero configuration, providing tools for environment/cluster discovery, telemetry metrics, and connector troubleshooting across global and regional tiers.
An open source Model Context Protocol server that gives AI agents direct access to Confluent Cloud and local Kafka clusters, with tools to discover topics/schemas/connectors, build resources, manage configurations, and debug issues.
Traditional Apache Kafka brokers with cluster sprawl and independent cluster management → Confluent Private Cloud with Intelligent Replication and broker-native multi-tenancy
Shift from managing multiple underutilized clusters toward consolidated infrastructure with shared physical clusters supporting multiple logical clusters, reducing operational overhead and infrastructure costs.
Virtual Kafka cluster (LKC) running on shared physical clusters with strict namespace isolation, granular quota enforcement, self-service onboarding, fine-grained observability, and dedicated endpoints. Early Access and General Availability planned for later in 2026.
Gateway-based policy enforcement for encryption, governance, and client behavior, including Gateway Field Level Encryption, Gateway Payload Encryption, and Deep Schema Validation capabilities. Early Access opening later in 2026 with General Availability planned thereafter.
Confluent Private Cloud (CPC) with Intelligent Replication
Enhanced broker architecture optimized for total cost of ownership, delivering up to 73% fewer brokers while matching latency SLAs compared to Apache Kafka, with improved tail latency performance and predictable performance at peak saturation.
Web search-based and training data-driven documentation retrieval for MCP agents → Native MCP server tools with direct integration to canonical docs.getdbt.com via search_product_docs and get_product_doc_pages
Shifted from inconsistent documentation retrieval methods (web search, training data, HTML rendering) to a native architectural solution that directly accesses canonical Markdown documentation through dedicated MCP server toolsets, ensuring agents have guaranteed access to current, authoritative docs.
Integrated product docs toolset into dbt's Developer agent experience within dbt platform and the Studio IDE, bringing canonical documentation closer to users in their native development environment.
Added a new 'Product Docs' category (the ninth toolset) to the dbt MCP server with two tools: search_product_docs for searching docs.getdbt.com with ranked results, and get_product_doc_pages for fetching full Markdown content of docs pages. Enables developers to access documentation directly within AI tools without context switching.
Support partial parsing for function nodes. Add UnparsedMetricV2 for new-style YAML Semantic Layer Metrics. Allow defining function arguments with default values.
A shift from maintaining static allowlists that become stale when schemas change to encoding data access policies as plain-English ontologies that are dynamically applied per column at build time, using LLMs to evaluate ambiguous cases through value inspection.
Integration with Claude (Claude Sonnet 4.6) LLM via the Anthropic API for runtime policy decision-making on column classification in the ontology-driven schema evolution feature.
dltHub AI Workbench - Ontology-driven data modelling toolkit
A toolkit within dltHub Pro that guides modeling decisions using ontologies during the transformation section of data pipelines, integrated with REST API ingestion, data exploration, and production deployment capabilities.
Ontology-driven schema evolution with LLM propagation
A feature that encodes data access policies in plain-English ontologies and uses LLM runtime evaluation to automatically classify columns as analytics-safe or reject them based on name patterns, data types, cardinality, and value inspection—enabling policies to adapt automatically when schemas change without code modifications.