DataStax today is announcing the beta release of Astra Streaming, a new standalone service of its Astra cloud that will operate independently of and integrate with Astra DB (formerly called DataStax Astra). The new service is based on technology that came with the Kesque acquisition back in January and superseded the Luna streaming service. And nope, don’t confuse this with the mobile Android app Astra Streaming Studio that consumers can download from the Google Play store.
Being added to the Astra portfolio, Astra Streaming will have multi-cloud support and, while offered for free (with upper limits) during the public beta, will eventually be priced on a pay as you go basis once the service enters general release.
The underlying technology is based on Apache Pulsar, a publish/subscribe (PubSub) messaging that is often compared to the better-known Apache Kafka. And in fact, DataStax promises that the service will be compatible with Kafka via an existing wrapper; while not initially available during the beta, we expect that feature will go live with the production release.
Pulsar, like Kafka, follows a long in a long line of messaging technologies that date back to the mainframe era from Tibco and IBM and in the Internet era like JMS and RabbitMQ. Kafka is by far the better known on the block of the current generation, developed at LinkedIn while Pulsar came out of Yahoo, both being top-level Apache projects.
There are a number of similarities between Pulsar and Kafka; both were designed for scale-out, deliver long durability guarantees, support replication across geographies, have a wide range of operating utilities, and (for now) a mutual dependence on Apache Zookeeper for storing metadata.
But there are also important architectural differences between Pulsar and Kafka. Among the most basic differences is that Pulsar pushes messages to subscribers while Kafka requires subscribers to pull them down. And architecturally, Kafka is simpler; it combines message broker and message persistence in the same tier, while Pulsar divides them up. This leads to numerous debates and pretty fierce rivalries contending, which is the superior approach.
For instance, Pulsar backers claim the three-tier architecture (which also includes Zookeeper) is more flexible and scalable. Message processing load balancing is automatic, and the separate persistence layer allows broker work to be redistributed and spread across more nodes without losing data. Kafka backers rebut that their approach results in a simpler architecture with half as many servers and is more economical. They are also planning to simplify the architecture further by removing Zookeeper, but this is still a work in progress.
There are further debates on which PubSub system replicates data more efficiently; stores data only once; supports exactly-once transactions; provides deeper support for message queueing; is simpler to configure, and delivers higher throughput. And there are even more debates over support of multitenancy, tiered storage, allowable message size. Prior to being acquired by DataStax, Kesque laid out its rationale for choosing Pulsar.
In all, this debate is very reminiscent of the debate over Spark Streaming vs Flink. Both attacked the same problem from mirror image approaches, and one emerged much sooner and drew wider (almost universal) industry support. Yet, in spite of Spark’s market head start and wide presence, Flink has thrived, one amongst many streaming alternatives to Spark’s microbatching. And in spite of Kafka’s ubiquitous presence in the market, Pulsar has drawn support from some household names such as Splunk, whose support came through its Streamlio acquisition.
DataStax’s unveiling of Astra Streaming is not exactly a surprise. The writing was on the wall when DataStax acquired Kesque back in January, which offered its own Luna Pulsar service. The difference with Astra Streaming is more than a rebranding. While customers had to manage Luna, Astra Streaming will be fully DataStax-managed.
Disclosure: DataStax is a dbInsight client.