In the past few weeks, we saw a few key announcements that point to a new trend in the data infrastructure ecosystem:
- Databricks Strikes $1.3 Billion Deal for Generative AI Startup MosaicML
- Snowflake Summit 2023 Announcements (a combined set of announcements from Snowflake that extend its functionality to much more than just being a data store)
- SingleStore Notebook: New Features for Analytics, Machine Learning & Data Exploration (disclaimer: I work at SingleStore!)
- Introducing Microsoft Fabric: Data analytics for the era of AI
- Generative AI, Streaming Data, and More Come to MongoDB Atlas
While Databricks has been ahead of the curve in this regard, other data infrastructure companies are catching up and augmenting their services to offer much more than just storage capabilities. In essence, databases can’t be just databases anymore. In this blog post, we’ll take a look at what these services are building to augment their core offerings. I recently wrote about database developer experience (DX), and most of these new features are all about simplifying existing workflows for developers.
Custom Code Execution
Various database providers are adding support for custom code execution with multiple programming languages. There are, however, different flavors to this. For instance, Snowflake has Snowpark which allows for writing functions with a few different programming languages. SingleStore has “Code Engine” which leverages WebAssembly to address a similar use case. And now Snowflake has announced “Snowpark Container Services” which have a different execution model where the code runs alongside the database but not really within the database.
I know from experience at SingleStore that these systems are hard to build, mainly because guaranteeing maximum security takes time and effort. However, these features unblock a lot of interesting use cases for developers. As an example, these can be used to easily schedule cron jobs for data cleanup.
These providers are also adding the ability to spin up collaborative Jupyter Notebooks or Streamlit applications much more easily. Databricks has always had Notebooks, but SingleStore has also recently introduced support for them. My prediction is that other providers will announce these soon as well. The power of these integrated Jupyter Notebooks is that they are setup to automagically connect to the data, instead of requiring developers to figure out the connection details.
All Kinds of AI Integrations
And then there’s AI. The acquisition of MosaicML by Databricks says it all. The $1.3B value is likely inflated as it is partly based on Databricks' last private valuation in 2021. As I wrote about recently, these private valuations from a couple of years ago are very different from the public valuations that these companies are getting today. Nevertheless, it is still a hefty price tag, which speaks to the value of making it really easy to train ML models with one’s data without having to plumb many different tools together.
In fact, even AWS Aurora, which most people would consider as a no-fuss database provider, has had basic ML integrations since 2019 (with Sagemaker and Comprehend).
MongoDB Atlas recently announced an offering for Stream Processing, and they already have a data visualization tool called Atlas Charts. Power BI is an integral part of Microsoft's recent Fabric AI announcement, potentially indicating a trend where database providers aim to capture a larger market share.
Analytics + Operations Combined
In order to become the single data storage solution for their customers, I predict all database providers will try to support both analytical workloads and operational workloads (with strong transaction guarantees). This is not an easy feat, and not every provider may attempt to achieve it.
SingleStore has natively supported HTAP (hybrid transactional/analytical processing) for several years, and Snowflake recently announced Unistore with the same goal. MongoDB has been iterating towards being able to support analytics, Google released AlloyDB and AWS is trying to do this with an Aurora+Redshift integration. So, we can see that again, there’s many different flavors to this. But one aspect is consistent — database providers are naturally trying to compete for more potential revenue.
Bonus: SQL and NoSQL Coming Together?
This item is not really connected with the other sets of features I’ve written about so far. But, it’s still something that’s really fascinating to me.
With MongoDB announcing the general availability of Atlas SQL and SingleStoreDB launching a MongoDB (MQL) API, my prediction is that more databases are going to become multi-model. As an example, CosmosDB already speaks six different languages, i.e., they have six different APIs that developers can use to read/write data. So, perhaps in the future, SQL vs NoSQL won’t be such a heated battle as more databases become “multilingual”. Besides, part of the initial fascination with NoSQL was that it allowed for much higher scalability and performance. But over the past decade, SQL databases have more than caught up in terms of scalability and performance.
Most companies are still adopting database providers based on their ability to provide a fast, reliable and secure data storage+retrieval system with good enough developer experience. But, in order to standout, companies like Snowflake/MongoDB/SingleStore/Databricks/etc. are expanding their surface areas to support a lot more use cases. This also opens the door for more monetization, of course. The race is on, and it’ll certainly be a really exciting one to watch (and in my case, be a part of).
Feel free to reach out on Twitter!