DataStax Astra DB gets Change Data Capture
Written by Nikos Vaggalis   
Monday, 25 April 2022

DataStax adds CDC to its Astra DB database-as-a-service platform to deliver database changes in real time via event streams, making real-time data available for use across data lakes, data warehouses and other applications.

CDC is a way to capture changes made in the database and forwarding them in real-time to external applications (such as Kafka) through connectors such as the ones offered by Debezium, the open source distributed platform that turns your existing databases into event streams.

There are many ways to implemented CDC like row versioning, pubsub, triggers and log monitoring, with the log-based one being the most popular and automated. As for a painstakingly manual trigger-based approach make sure to check "Connecting To The Outside World with Perl and Database Events".

The use cases of CDC include real-time analytics, database replication or customized solutions like Connecting To The Outside World with Perl and Database Events, which uses Ingres as the underlying DBMS and, through trigger-based CDC, it transforms SQL data to hl7 which then posts to a web service:

At different points in time, hospital clerks collect the details of the patient's visit and register them to the system through a GUI application/data entry form.This data will then end up as rows in several tables in the database.The Ingres RDBMS will handle the database part, while Perl will handle the application part being in charge of gluing the database to the outside world by extracting and transforming this data to a HL7 message and sending it over to a Web service endpoint.

Astra DB's CDC too is powered by the Astra Streaming technology, which is built on the Apache Pulsar distributed pub/sub-messaging system.

Using a simple configuration based approach, you can enable CDC on one or more Astra DB tables and publish the changes to an event topic in Astra Streaming. From there, your real-time applications can subscribe to change events using client libraries in Java, Golang, Python, or Node.js. Additional endpoints support direct subscription via websocket interface or using a standard JMS client. And with that you cover a wide range of use case scenarios: 

  • Data integration: Immediately send updated data throughout your data ecosystem when a piece of data changes in Astra DB.
  • Machine learning: Leverage Astra Streaming’s event persistence capabilities to replay a sequence of changes as inputs into ML models for training and scoring purposes.
  • Real-time applications: Build applications that respond to CDC change events to drive business logic in response to specific changes being detected in your Astra database.
  • Advanced search: Push data from your Astra DB instance into a full text search engine such as Elastic.
  • Notifications: Detect when changes on your Astra database occur and integrate with platforms such as Twilio or Firebase to send SMS or push notifications.
  • Reporting and analytics: Ensure that business stakeholders are using up to date data to make critical decisions that can impact your business. 

Security monitoring: Gain visibility into anomalous behavior that may indicate a security breach with CDC’s consumable stream of event data.

To start with CDC quickly you should go through the following steps: 

  • Create an Astra account, if you don’t have one already.
  • Create an Astra Streaming tenant.
    Astra Streaming is based on Apache Pulsar which uses a concept of tenant as its top level administrative unit. A tenant has multiple namespaces for logical grouping of different applications. In each namespace you can create multiple topics to send and receive data.
  • Create an Astra Streaming namespace.
  • Create an Astra Streaming topic.
  • Download Pulsar connection information in order to connect to your new Astra Streaming topic.You'll need the following info
  • Broker Service URL: The Pulsar Binary Protocol URL used for production and consumption of messages.
  • Web Service URL: URL used for administrative operations.
  • Astra web token: The JWT used for authentication in all Astra Streaming operations. 

That's the boilerplate procedure.From there on you can use the Apache Pulsar CLI tools to produce and consume messages.

 

More Information

CDC for Astra DB

Related Articles

Connecting To The Outside World with Perl and Database Events

DataStax Extends Stargate

DataStax Adds gRPC To Stargate For Performant Microservices

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Couchbase Adds Vector Search
07/03/2024

Couchbase is adding support for vector search across its entire product line including Capella, Enterprise Server, and Mobile. Support has also been added for retrieval-augmented generation (RAG) tech [ ... ]



SnapCode: A Java IDE for the Web
27/02/2024

Thanks to CheerpJ and WebAssembly you can now run a Java IDE inside your browser and local first.This is SnapCode, and while lightweight and in-browser, is to be not underestimated.


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 25 April 2022 )