Automating K8s Cluster Backup and Upgrades with Zero Downtime

Post on November 7, 2022 by Mahesh Wable

Headshot of Mahesh Wable
Mahesh Wable

Replication of a Kubernetes[k8s] cluster across platforms can be a waking nightmare for businesses. Performing backup and restoration of files and microservices has, traditionally, been a time-consuming, manual task that is loaded with complexities. However, for containerized applications, traditional backup solutions are not viable.

Traditional Challenges When Performing In-Place k8s Upgrades in an On-Premise Setup

In place upgrades to an on-premise setup are always challenging because hardware is limited, and you run the risk of stability issues. Changes to the core components of a Kubernetes cluster, such as deployment, ingress, service, or auto-scaling configuration, can lead to disaster if the upgraded k8s cluster syntax is not compatible with the system. To combat this, in an on-premise Kubernetes upgrade, we have to create a parallel setup using the upgraded version and first verify all services and functionality before gradually increasing the hardware.

Making the setup available per a compatible k8s version configuration is the challenging part, and something we have resolved by developing automation that sits on top of Velero, a well-known tool for k8s cluster migration and disaster handling.

Why Traditional Backup Solutions Don’t Work with Containerized Applications

Conventional solutions for backup and rollback are designed to preserve the data of a single system — whether it’s a virtual or physical machine. Consequently, traditional backup for containerized apps becomes infeasible or inadequate for companies that are generating vast amounts of data and that requires a powerful data protection solution.

Containerized applications require a unique approach since they run in several pods and across multiple servers. This requires a large number of objects to carry configuration and application data. For instance, to achieve quick recovery, the Kubernetes backup solution must collect data and application settings at a granular level.

Furthermore, system collapse can prove detrimental to the financial health of any business. Bringing the services back to full swing, typically, requires extensive efforts and expertise, and is time intensive. Therefore, a powerful Kubernetes migration and backup tool delivers a wide range of business gains.

Here is what we were focused on at PubMatic:

  • Migration of more than 150 applications running on Kubernetes clusters from the older version k8s 1.1X to the upgraded version k8s 1.2X
  • More than 10 clusters with more than 1000 pods in production serving live traffic
  • 10 data centers with more than 150 namespaces across all clusters

A viable Kubernetes backup solution must provide:

  • Little to no cluster downtime due to scheduled and unscheduled disruptions
  • Quick restoration of data, in case of data loss, downtime, or cluster disaster
  • Rollback and backup configurations for entire clusters as a unit
  • Robust data protection strategies to guarantee data privacy and regulatory compliance

With regard to those conditions and features, we were able to zero in on a popular open-source Kubernetes backup tool Velero. The tool acts as an end-to-end solution offering compatibility with nearly all major data services.

Velero — An Open-Source Tool

The market currently offers a myriad of Kubernetes cluster migration solutions boasting different capabilities and merits. However, given our requirement for process automation, Velero emerged as the best fit supportive of our specific use cases.

What specific capabilities does Velero bring to the table for the Development and QA teams and the company as a whole?

  • Rolling deployment of applications on the newer (version 1.2X) Kubernetes cluster
  • Automation of the release cycle for migration of applications
  • Safe backup and restoration of Kubernetes cluster resources to avoid unplanned disruption risks
  • Handling and mitigating cluster disaster recovery — cluster downtime may have associated economic costs.
  • Enabling support for frequent k8s upgrades
  • Capability to rollback deployment on previous configuration or version

If one of the several available namespaces in the Kubernetes cluster were to be compromised or if the deployment did not work, a robust rollback strategy ensures flexibility in terms of reverting to previously deployed configuration versions. Moreover, tools like Velero offer substantial stability and scalability that ensures avoidance of rework when exploring new technologies at later stages.

Getting Started with Velero

Below are a few useful functionalities of Velero which helped in the migration of applications from the legacy k8s cluster to the upgraded k8s cluster:

  1. Backup of Kubernetes namespaces [ velero backup create <backup_file_name> — include-namespaces <namespace_name> ]
  2. Restore namespaces from backup [velero restore create — from-backup <backup_file_name> ]
  3. Automation with schedular to have recent backup [ velero create schedule backup-name — schedule=”*/30 * * * *” — include-namespaces <namespace_name >— ttl 48h0m0s ]

Steps for migrating data from k8s 1.1X to k8s 1.2X using Velero and Minio

Minio is used here for storing backup data with s3 compatible object storage.

  • Deploy Velero on both clusters with shared Minio storage.
    • To perform data migration, the tool requires two different Kubernetes clusters — one from where to migrate (Kubernetes cluster A) and another to migrate on to (Kubernetes cluster B).
  • Create a backup from k8s 1.1X cluster for all namespaces.
  • Update required k8s component changes as per upgraded k8s version via automation on backup files which are taken from 1.1X k8s cluster.
    • This step is only applicable if you are going to migrate applications on the upgraded Kubernetes cluster 1.2X k8s. If you are migrating applications on same cluster version, there is no need to perform any updating on backup files because you can restore backup files as it is if both clusters are with same version of Kubernetes.
  • Restore the created backup on upgraded cluster k8s 1.2X.
  • Migrate all production traffic to the newer cluster version.

By following the above procedure, we were able to seamlessly perform application migration for more than 150 namespaces within a few minutes with the tool.

We are planning to use this framework for continuous Kubernetes upgrades on our platform.

Organizational Benefits of Process Automation for Application Migration from One Cluster to Another

Leveraging Velero — a robust and competent Kubernetes backup solution — enables our business to:

  • Harness enhanced business agility
  • Benefit from lower operational costs
  • Empower teams with the tools to innovate
  • Reduced risks of unplanned disruptions to applications and data
  • Preserve data integrity
  • Protect business-critical workloads and associated containers