Comparing open source Cloud Native DBaaS solutions
As I’m no longer in Percona since April 1, I want to use this unique time of my life to compare various solutions for running and managing databases in Kubernetes. As an outsider, with no bias.
I will not look into Operators focusing on a single technology, but will touch solutions that provide a unique opportunity to use various database engines.
Klutch.io - they are just starting and at the very early stage. It is going to be unfair to compare this solution against other more mature competitors.
I’m going to look at solutions through these dimensions:
Open source and community - licenses, community management, github metrics, contributors
Onboarding - how easy it is to discover the solution, install it, follow quickstart guides and documentation in general
Extensibility and integrations - database engines, integrating with other cloud native and/or observability tools
Day-2 Operations - what are the various management and maintenance capabilities that solution provides
TL;DR
Here is the summary table of how our solutions compare to each other. 1 is the lowest, 5 is the highest score.
Very short cycles between releases (at least of beta minor versions), actively developed and released. Approx 90% of code contributions are coming from the Apecloud team, where Maintainers are 100% from there. Well written README with various links. Even though the product is open source, it does not have enough community traction, which should be seen in the number of external contributions and maintainers. You can talk to developers through community slack (https://kubeblocks.slack.com/).
Score: 4/5
KubeDB
Github:kubedb License: Apache 2.0 (but open core model) Stars: max 731 Contributors: 36 Open issues / Pull requests: 0 / 0
There are 107 repositories in kubedb org, with no clarity which repo people should be sending commits to or raising issues at. The website leads just to the org in github. I just picked one repository that has the biggest number of github stars (kubedb/cli). READMEs has little details and information. It is also unclear how to talk to anyone from the community - I was not able to find any forums or community slack channels.
Cozystack is in CNCF Sandbox, so their github repo is well maintained and follows the guidelines. At the same time I was surprised to see a poorly written CONTRIBUTING.md file with no details on how to build the app, test it and so on. It is good to see quite a list of maintainers that are coming not only from Ænix (Cozystack parent company). They also have biweekly community meetings and a public roadmap (both are the standard for CNCF projects).
Well written README, but a bit weird CONTRIBUTING.md that seems to be from a different project (Everest API). Active conversations in community forum. 99% of code is contributed by the Percona team, but it is good to see that questions in Github issues are answered in a timely manner (though there are some issues that sit there for more than a year).
Score: 3/5
Onboarding
KubeBlocks
Tag line: “Run Any Database on Kubernetes”
From the main page you either go to documentation or click the mysterious Try KubeBlock Online.
I might say that KubeBlocks took onboarding to a new level by providing an interactive tutorial with a playground. You can try the product right in the browser and get the grasp of how it works and the value it provides. But at the same time I was surprised how complex the installation of KubeBlocks is. There is no quickstart, but there are multiple cumbersome steps that stand between opening the documentation and getting to the aha moment. Documentation is well written, provides a lot of details and examples.
KubeBlocks has a concept of plugins (we will talk about it later on), so it is a good thing that they have additional docs for Developers who want to create their own plugins.
A command line tool kbcli provides a way to install and manage the application itself and databases. They even thought about autocompletion for it, which gives it additional points.
Score: 5/5 (even installation is a bit complex, having a playground covers it all).
KubeDB
Tag line: “Run Production-Grade Databases on Kubernetes“
The main site guides us to the documentation after clicking on “Try now free”.
Installation of KubeDB requires a Free 30 day trial license. You can obtain the license file after registration, but it is a huge onboarding red flag. I tried to get the key, but did not get any email from them. There is also a kubectl plugin that is humbly called “dba” :) . It is not very well documented and unclear what capabilities it provides.
In a nutshell KubeDB is an operator that allows you to manage multiple database technologies. I actually was under the impression that KubeDB provides a Web UI to manage databases, but it seems it is not (or it is very well hidden somewhere).
After installation of KubeDB, it is just a matter of applying various YAML manifests with helm or kubectl to get the databases up and running. Regular operator flow. Documentation is well written and provides insights about different use cases and situations.
Score: 2/5
Cozystack
Tag line: “Cozystack: Free PaaS platform and framework for building clouds”
From a main site user gets to the docs after clicking the “Get started” button.
A huge obstacle is this line in the docs:
“Cozystack installs on top of a Kubernetes cluster. Talos is the only Kubernetes distribution that is fully supported by Cozystack.”
As a user I want to try out the tool in the environment that I’m used to. Limiting k8s choices to Talos only (which is far from being the most popular flavor) is weird. At the same time we need to remember that Cozystack is a Platform-as-a-Service that must provide various networking, storage and compute capabilities. This might of course limit the choices of the underlying infra.
I installed Talos eventually after a few attempts using talos-bootstrap. Installation of Cozystack after that is just a few kubectl commands.
Documentation provides bare minimum details that can get you to install Cozystack and deploy various apps. For example, bundles documentation - I understand what it is, but it is not really clear when I should be using this bundle or any other bundle. What are the benefits of each and limitations?
On the main website I saw one of the benefits of Cozystack being “API first”. In the docs they have a separate section about API, but it is a Kubernetes API with Aggregation Layer. At least the Terraform example suggests that you should use k8s api to create Cozystack resources. And it makes sense, but looks like a marketing move.
Score: 3/5
Percona Everest
Tag line: “Finally, a Cloud-Native Platform That Puts You in Control”
To be frank, the tag line is weird. The sentence below clarifies what Everest really is: “Easily provision and manage multiple databases on-premises and in the cloud with a first-of-its-kind, open-source platform.”
On the main web page the click on “install now” scrolls the page down to other choices.
I got here:
In the documentation we finally can start exploring Everest. Documentation is nicely structured with buttons and suggestions on where to go next. Quickstart guide suggests to deploy the app with Helm. Everestctl is also available.
This line in the docs triggered me a bit:
“We recommend setting up Percona Everest on the Amazon Elastic Kubernetes Service (EKS) or Google Kubernetes Engine (GKE).”
Knowing the internals of Everest it is unclear why we recommend to use EKS or GKE, as Everest can run anywhere. I think it deserves the explanation or should be removed.
Everest comes with a nice Web UI that simplifies the deployment and management of databases. So after installation it is a click-click-click experience to get the first database up and running.
I looked through the API documentation of Everest. I love it and hate it. I love that it is designed in a way that you would expect from an API docs - with examples for requests and responses, dynamic fields, examples for multiple coding languages. I hate it, as it is unclear how to use it at the first glance. For example, to create a MySQL database cluster, what endpoint should I use, what params should I pass to the API, etc. It is just not clear enough.
Score: 3.5/5
Extensibility and integrations
This section is the most important one, especially for open source products. Contributors are more likely to create new extensions vs sending pull requests into the core of the application. Integrations increase the likelihood of the product to be used in a complex environment. For example, if there is an existing observability tool, can the application send metrics to it?
KubeBlocks
KubeBlocks operates with a notion of Addons that are described right in the introduction.
Every database technology is an Addon. Right now they support more than 30 technologies. See them in apecloud/kubeblocks-addons. They also documented how somebody can add a new Addon. I strongly believe that the Addons concept and a documentation on how to create those is the reason why Kubeblocks has the biggest number of contributors.
The How behind Addons is Helm. You can easily convert a database that has a helm chart into an Addon. The usual problem with Helm is day-2 operations. We will see how they address it later in the blog.
For observability KubeBlocks chose to integrate with the Prometheus and Grafana stack. It is the easiest path in the Cloud Native world that helps to get early adopters. Each Addon you create can easily integrate with Prometheus if you follow the rules.
Score: 5/5
KubeDB
KubeDB is a black box. They do support all modern and exotic databases (like MSSQL). But how they do it is a mystery, it seems the secret sauce is not shared. I can only guess, but:
Their Operator has a bunch of CRDs, one per each database technology.
As long as they quickly added a lot of technologies, they might rely on available helm charts.
They have CRDs not only for databases, but also for proxies. For example, ProxySQL and pgpool.
As for observability - they also rely on Prometheus. It is also possible to integrate with any other monitoring solution that supports metric scraping. They even have a nice article on how to work with Datadog.
For backups they rely on their proprietary solution - Stash. Seems it is used for every database technology they support. Adding your own backup solution might be possible, but it is for a user to figure it out.
Score: 2/5
I just can’t score it high enough as everything is hidden and it is unclear how to add new technology or alter the existing one. Not very open source.
Cozystack
Cozystack has a notion of Components and Managed Apps. Components are core stack features, like Kubernetes, FluxCD, Grafana, VictoriaMetrics and more. Managed Apps are where databases and other technologies are. For databases they rely on Operators: CloudNativePG for PostgreSQL, Altinity Operator for ClickHouse, MariaDB Operator for MySQL. I’m not really sure how to draw the line between components and managed apps, it is not well explained.
Looking at github they create a helm chart per Managed App, which underneath deploys the operator and its components. postgres/templates/db.yaml is a helm template that is used to deploy PostgreSQL databases using CloudNativePG. You might notice that it also deploys an additional resource WorkloadMonitor that enables the monitoring for the application. You will be able to see it in the Cozystack interface and in built-in Grafana dashboards.
The github repository is well structured and seems it is quite easy to add an application or change its behavior. The only thing that is missing for me is a well thought out documentation on how to do that.
Score: 4/5
Percona Everest
Right now Everest supports 3 database technologies - MySQL, PostgreSQL and MongoDB. Implementation relies on Percona Operators for these technologies.
Looking deeper into internals of the product here is how it works:
There is a component called everest-operator. It is a main operator that translates Custom Resources so that other database operators can provision their resources.
Everest UI is written in React and MUI and translates all user commands through API into everest-operator Custom Resources.
Adding a new database is going to require some skill. You first need to figure out how to add a new technology into everest-operator. Then if you need Web UI, you need to add the capabilities into it through React. With some experience with golang and frontend engineering, you can do that.
For monitoring Everest currently integrates with Percona Monitoring and Management (PMM). Actually it relies on the integration that Operators provide (as each Percona operator integrates with PMM). Right now it is not really possible to integrate with Prometheus or any other monitoring tool.
Score: 2/5
There is room for improvement here. Right now adding new technology or integrating with other tools is cumbersome.
Day-2 operations
Deploying databases is easy. The biggest challenge that businesses face is maintenance and management. This is what day-2 operations are about. We covered some of the management features in the previous section, but let’s look a bit deeper into the topic now.
KubeBlocks
For various management operations KubeBlocks introduces new Custom Resources. Some examples:
BackupRepo - creates the repository for backups, supports S3-compatible storage, Azure, etc.
BackupPolicy - defines the way to backup the cluster. You can choose from various built-in solutions. For example, for MySQL it is either Percona XtraBackup or volume snapshot.
OpsRequest - performs various actions for the Cluster or database instance. Scaling, upgrades, various jobs. OpsRequest for some reason is not covered in reference architecture, but it is a crucial piece.
Database monitoring relies on Prometheus operator custom resources. User creates well defined PodMonitor resources and Prometheus starts to scrape the metrics.
User management is also available, but not for all database technologies for now.
I was a bit surprised that KubeBlock does not have a proper API. So to integrate with it or perform even simple task you must have a good understanding of how KubeBlock’s Custom Resources work.
I have not found any auto-pilot capabilities. For example, automated volume expansion triggered by thresholds, or vertical scaling for serverless-like experience.
Score: 4.5/5
KubeDB
KubeDB follows a similar concept of Custom Resources for management operations. The main resource is OpsRequest (example for PostgreSQL). Creating an OpsRequest resource triggers scaling - both Vertical and Horizontal - and upgrades.
Monitoring is done in the same way as in KubeBlocks - integration with Prometheus Operator.
On their website they briefly mention that you can manage database users through KubeVault - another product by AppsCode. But I was not able to find any instructions on how to do it.
For backups they use Stash. The configuration of backups is not that straight forward, as first you need to understand all the custom resources of Stash itself. Only then, you can have a specific annotation that is going to enable backups for the cluster.
No direct API and no auto-pilot capabilities, similarly to KubeBlocks.
Score: 4/5
Cozystack
Cozystack relies on Operators, but for some reason I was not able to find how Operators’ management capabilities are exposed there through the UI or API. It is possible to manage databases and execute various tasks through kubectl using regular Operators Custom Resources. At the same time, for backups they created their own scripts (example packages/apps/postgres/templates/backup-script.yaml).
It is good to remember that Cozystack is not really a DBaaS, but allows somebody to leverage their platform capabilities to build it. It makes it hard to compare with other solutions as the task is a bit different.
Score: 3/5
Percona Everest
Similarly to Cozystack, Everest relies on Operators’ capabilities to enable day-2 operations. At the same time there is a gap between what Operators can do today and what Everest allows users to do through the everest-operator or through the Web UI/API. But rest assured basic day-2 operations are already supported:
Backups and restores
Manual vertical and horizontal scaling
Volume expansion
Database upgrades are linked to the Operator upgrade now. So to upgrade from one minor PostgreSQL version to another you would need to upgrade the Operator. The reason is rooted in the Percona Operators release approach - where each new version of the Operator supports some new database version. It might be greatly improved.
Monitoring integration with PMM is also quite a weak part of Everest. PMM is an awesome tool, but I would love to see an integration with other tools, like famous Prometheus Operator. Even Percona Operators support sidecars to enable various exporters, this feature is not yet exposed in Everest itself.
It is good to see that Everest comes with an API that developers can use to develop their own integrations and applications.
Autopilot capabilities and user management are not yet available in Everest.
Score: 3/5
Conclusion
Looking at these solutions is something that I really enjoyed. Enjoyment comes from a fact that open source Database as a Service solutions are maturing. Not only that, they are production grade and can compete with various cloud offerings, but allow businesses to avoid vendor lock-in and run their databases anywhere. It is important in the world of uncertainty and sovereignty requirements coming from governments.
Peaking into the future, I would expect these solutions to become more community driven, flexible and mature. As for community driven, none of the projects have strong community motion behind them. Cozystack is in CNCF Sandbox, but it does not look to have significant code contributions and traction. KubeBlocks' Addons structure for sure pushes community traction forward.
Some of the offerings already try to think about the future that AI holds, for example KubeBlocks provides Xinference and MilvusDB - both components of a modern LLM stack.