A couple of weeks ago, I was in an internal meeting where our CTO (Adam Prout) was giving a talk to our summer interns. The talk was about the current state of database companies, and how different providers are competing on different facets of their products. As the talk was nearing its end, I wrote down this quote (slightly edited):
“You will not find a broader set of Computer Science problems inside one piece of software than by working on a cloud database, especially general-purpose databases that attempt to solve a lot of different use cases. You get to work on all kinds of things from memory management, scheduling algorithms, low-level optimizations like SIMD and efficient operations on compressed data, query optimization, etc. And then there’s the whole cloud-native set of challenges. There’s cloud computing and figuring out how to best use things like blob stores/S3, and security and all the rest of it.”
This quote stuck with me. Obviously, having worked at SingleStore for many years now, this is not really news to me. However, I had never heard it put so succinctly like this. Databases are hard to build, and when people think of the interesting challenges behind building databases, they typically think of:
- Query processing and query optimization (parallel join algorithms, vectorized execution, query compilation, operations on compressed data, …)
- Storage models and compression
- All kinds of distributed systems challenges such as replication, fault detection, consistency, etc. And don’t forget concurrency control!
- CPU/Memory and other resource management (some systems like SingleStoreDB even utilize machine learning to do this well)
- Security of the core database system
- et cetera, et cetera, et cetera…
This is an incredibly short list, it really goes on and on. And some of these areas are so deep, that some engineers spend decades just focused on one of them.
Now, what absolutely fascinates me is how building a cloud-native database adds another huge layer of extremely interesting problems to this. Cloud-native databases operate in a new landscape and have to make use of the various cloud computing services in order to differentiate themselves.
For instance, blob storages such as S3 have enabled cloud database providers to offer flexible, unlimited storage (SingleStoreDB even coined the term “bottomless storage” for this). Another example is figuring out the right tradeoffs between using local SSD disks and block-storage services (AWS EBS and others). Furthermore, in the cloud, it is expected that databases can scale up and down much more flexibly. Doing this well is really hard, as data often has to be re-partitioned and these operations are very resource intensive.
And I haven’t even mentioned security yet. If you’re hosting your customers’ data, then you have an entire set of security and compliance challenges to deal with. These are so broad and complicated on their own, that cloud database companies have to hire large teams of experts in this field to be on top of everything.
To make this even more interesting, if your database service runs on multiple cloud providers (Azure, AWS, GCP or others), then everything becomes more complicated. This is because the different providers have different offerings, with different performance, cost, security and reliability characteristics. And while Kubernetes helps a lot here, it alone doesn’t solve a lot of these issues.
Finally, there’s developer experience (DX). This is not unique to cloud-native databases, but the emphasis on DX is more prominent in cloud-native databases because operating in the cloud allows database providers to do many more interesting things in this area (I wrote about this recently).
When I joined SingleStoreDB seven years ago, the product was only sold as a self-managed offering, but now SingleStoreDB Cloud is the face of the business. Being part of the transition between building a database that customers have to run themselves and a cloud service has been tremendously educational.
Feel free to reach out on Twitter!