AKS Cost Optimization: Top 7 Strategies to Cut Your AKS Cluster Costs

Azure Kubernetes Service (AKS) is one of the most popular managed container services. While offering multiple benefits of managed Kubernetes, it can prove tricky when it comes to forecasting and managing your expenses. Learn how to understand and better control your AKS costs.

An overview of AKS pricing

Pay-as-you-go in AKS

Like other cloud services, Microsoft Azure’s container service also works on a pay-as-you-go basis. This means that you pay only for the costs of the resources that you use, such as:

  • VMs;
  • associated storage;
  • networking resources.

This model can be perfect for workloads with changing demands, allowing you to use different services without long-term commitments.

However, it’s also the most expensive option Azure offers and prices vary depending on the region.

That’s why AKS Pricing Calculator can be particularly useful when estimating your cluster costs.

Free AKS account and credit

Azure offers a free account that comes with a $200 credit available to you within 30 days. While using this budget, you can get access to Azure services, both free and premium ones.

Once you exhaust it, you will be placed on a pay-as-you-go plan and charged for what you use.

No cluster management fee, but SLA costs extra

There is no cluster management fee in AKS. However, if you wish to get a financially backed service level agreement (SLA), you will need to pay $0.10/hour per cluster.

Azure’s SLA guarantees an uptime of 99.95% for the Kubernetes API server for clusters using Azure Availability Zone and 99.9% for clusters that don’t.

Cost saving options in AKS

AKS offers a few more economical options besides the default pay-as-you-go model.

Reserved VM instances

This option is suitable for steady-state usage. A year-long commitment gives you more pricing predictability, monthly payment options, and prioritized compute capacity. According to Azure, 1-year reserved can save you up to 48% on Virtual Machines Linux DSv2 compared to pay-as-you-go.

This option is best for instances running continuously as binding you for longer also promises higher discounts. Azure claims that savings for Virtual Machines Linux DSv2 on a 3-year reserved plan can reach 65% compared to pay-as-you-go.

Limitations of reserved VM instances:

Potential savings related to reserved instances may seem impressive at first, but they come at a price.

In the cloud world, a year of commitment is an eternity, not to mention three. Forecasting your usage for the entire period is a tall order. Your requirements will most likely be changing in the future, so you may need to commit to even more or get stuck with unused capacity.

Another source of significant savings lies in Spot Virtual Machine instances. They are unused resources Azure offers for up to 90% less than on-demand prices.

They are ideal for workloads that can tolerate temporary disruptions like batch processing or machine learning training.

Limitations of spot VMs:

Spot VMs can bring tangible savings fast, but they aren’t optimal for all workload types. You have no guarantee on how long they will stay available, as Azure can reclaim them at any time at a 30-second notice.

You need to have a solid plan to handle potential interruptions and, ideally, automate the process.

Top 7 strategies to halve your AKS cluster costs

1. Follow cost optimization design principles

A cost-effective workload hits business goals and ROI while staying within its budget. AKS cost optimization principles outline critical design decisions, helping you to assess and improve applications deployed on Azure.

These encompass choosing correct resources, setting up budgets and constraints, dynamically allocating and deallocating resources, optimizing workloads, as well as monitoring and managing costs. Read more about them here.

2. Rightsize your VMs

Selecting the right VMs can decrease your AKS costs significantly because you’ll only get just enough capacity for the performance you need.

However, the process can be challenging, as it requires establishing minimum requirements and then picking the right instance type and capacity.

Rightsizing involves a lot of work, but you can automate it. Platforms like CAST AI can pick the best instance types and sizes for your application’s requirements while still cutting your AKS costs.

3. Take advantage of autoscaling

Autoscaling is one of the most effective AKS cost optimization methods. The tighter your Kubernetes scaling mechanisms are, the lower your waste and the cost of running the workload.

Kubernetes comes with several autoscaling mechanisms helping to ensure enough capacity to meet demand without overprovisioning. However, their setup and configuration can be time-consuming. That’s why it’s a good idea to use advanced autoscalers that remove a large part of this manual workload.

4. Use preset AKS cluster configuration

Each workload has different needs. For example, a production environment requires higher spec VM SKU with redundancy across Azure AZs, while Dev/Test cluster can run with unnecessary features turned off.

Azure gives you different preset configurations for distinct environments highlighting the impact to cost, so check them out. Or go a step further and get specific configuration recommendations for your cluster that will improve your performance while reducing AKS costs by 50% or more.

5. Set resources requests and limits

Defining pod requests and limits is another great way to reduce your AKS cluster cost.

AKS integrates with Azure to provide centralized enforcement for built-in policies. It lets you specify CPU requests and memory resources to ensure their limits are defined on cluster containers.

You can push your AKS cluster savings even further by continuously reducing to the minimum number of nodes by bin-packing pods using an automation solution. Once the node becomes empty, CAST AI’s mechanism deletes it from the cluster — and here’s how it works.

6. Stop clusters that don’t need to be running

Not all clusters need to be running all the time. For instance, you could easily turn off a Dev/Test environment when it’s not in use.

AKS lets you stop a cluster to avoid unnecessary charges from piling up. By shutting down its node pools, you can save on compute costs while maintaining objects and cluster state for when you start it again.

And if you don’t want to keep doing it manually (because why would you?), check out CAST AI’s cluster scheduler. It will automatically turn your cluster off and on as required.

7. Automate Spot VMs

Spot VMs enable you to tap into unutilized capacity in Azure at a much lower cost than on-demand pricing. This solution is only suitable for workloads that can handle potential interruptions, but the juice is still worth the squeeze.

You can reap the pricing benefits of spot VMs and still use them safely, but to do so, you will need an automation solution. It will help you identify spot-friendly workloads, pick the right VMs, bid the price, and move your workload ​​automatically to on-demand instances in case of interruptions.

Automate AKS cost analysis and optimization

These strategies will for sure positively impact your next AKS bill. But if you want long-lasting and significant results, you need to move beyond manual cloud cost management efforts and embrace automation.

Teams using automated AKS cost optimization solutions reduce their expenses while improving performance and unlocking new opportunities. For instance, this fintech company has used automation to improve its cluster scalability, cut costs, and save engineers time.

Connect your cluster to the CAST AI platform and run a free AKS cost analysis to see custom recommendations you can apply on your own or automatically.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
CAST AI

Cut your cloud bill in half. AI-driven cloud optimization for Kubernetes. Instantly cut your cloud bill, prevent downtime, and 10X the power of DevOps.