The Road to Azure Cost Governance
Article Index
The Road to Azure Cost Governance
Chapters 3 to 6
Chapters 7&8, Conclusion

Chapter 3: Monitoring Costs

Azure Advisor is a great place to look for ways to enhance your Azure environment. It analyzes your subscriptions, and makes recommendations for costs, performance, reliability, and security. It has the additional benefit that recommendations can often be implemented there, with a click of a button. 

Next, there’s a look at Dimension Analysis, where dimension means a resource, like CPU or memory. The aim is to identify the most expensive resources that can be improved relatively easily. Various meter categories and subcategories are discussed, with a view to finding expensive resources.

There’s a useful section on defining budgets (i.e. forecast for a given resource), and an example of this is provided. Coupled with this, it is possible to create an alert, when a given actual or forecast is approached or reached (e.g. send an email).

Azure Cost Management Power BI provides a useful dashboard. It has various reports, including charges and purchases, and top 5 spending meter category services. It also provides details of Azure Hybrid Usage Benefit (e.g. save on licenses), and savings relating to reserving servers etc.

Next, we look at using automation scripts for cost control. There are various pre-existing runbooks that you can use (e.g. starting and stopping VMs), and links to these are provided. 

The chapter ends with a look at creating a custom cost management tool. A useful diagram of the approach is provided, with a step-by-step logical overview of the steps - it’s much like reading pseudocode for developers. 

This chapter provides a useful look at how costs can be monitored, using Azure budgets and alerts, and how further cost savings can be identified using the Azure Cost Management Power BI tool. The use of automated scripts and a potential design of a system for cost control is outlined.

Chapter 4: Planning for Cost Savings – Right-Sizing

Until now, the book has focused on understanding cloud bills, identifying your resources and spending, and how to monitor costs. We now move to the crux of the book, how to save money.

With on-premise systems there is often a deliberate overprovision of resources. This is typically because the hardware has a 3-5 year life expectancy, and to cater for future growth. This approach needs to change in the cloud, since it is much more dynamic, instead you can provision resources as and when required.

VMs are usually the costliest aspect of Azure spending. Looking at IaaS, the authors suggest when servers are initially moved to the cloud, there is often a 1-to-1 migration, resulting in typically using only 30%-40% of the VMs resources. From Azure Advisor, you can select Costs, and this can identify right-size or ‘shutdown underutilized VMs’ – these are VMs which typically have only 5% or less CPU usage (this 5% limit is configurable). This allows you to identify VMs that can be shut down or consolidated (after migrating their applications). 

After VMs, disks are usually the second most costly aspect of spending, especially for IaaS. You pay for the disk size, no matter how much of it you use. You need to be careful about how much you request, if you request 1,025GB you will be allocated and pay for 2TB (2,048GB), since it is the next size after 1,024GB. Sometimes, disks are provisioned in the expectation of use, but are not used. Similarly, after a cleanup of VMs, the disk may have been forgotten, and may still exist. Helpfully, if you go to the Disks page in Azure Portal, you can sort the disks by owner, any that do not have an owner are likely to be orphaned, and after due diligence, can be removed.

Next, there’s a section on enforcing on/off policies. It is possible to enable/disable services, databases, and VMs, either on-demand or scheduled. Again, this emphasizes the point of only using resources when they are needed, thus reducing costs. It’s noted this may be particularly applicable to non-production systems, where a startup delay is more acceptable.

The chapter ends with some real-life examples of cost control. This should give you some ideas for your own systems.

This was a very useful chapter, identifying the most common cost savings. I do wonder if some of the quick wins identified here (e.g. orphaned disks) should have been included at the start of the book, just as a taster of what lies ahead.

Chapter 5: Planning for Cost Savings – Cleanup

Cleaning up cloud resources can offer great savings. Often resources are created temporarily during complex projects, and these can be left behind after the project has completed – and they will continue to be billed for.

First there’s a brief look at the resources that are either always free or are free on the initial tiers within Azure. Where possible you should take advantage of these (e.g. SQL Server 2019 developer edition). 

The Azure Resource Graph has an explorer-like interface, that allows you to issue resource usage queries across resources, policies, and configs. There’s a useful link to another tool, Azure Governance Visualizer that builds a visual hierarchy and provides an aggregated consumption report. Although only discussed briefly, both tools look very useful for identify resources and their costs.

Next, there’s a look at unassociated services. These are often left over after a migration or Proof of Concept (PoC) projects. These are still billed and should be removed. Common examples include: unattached disks, unused storage accounts, static IP addresses, logs, and snapshots/backups. 

The authors suggest added tags to resources, that indicate when it can be removed, and this should be checked regularly. A useful diagram shows how this can work.

Migration projects are often a source of abandoned resources. Typically, there are parallel environments, dumps, and duplication. Again, the authors suggest a tagging strategy to identify migration resources, and check them regularly for when they can be removed. If you’re unsure that a resource is needed, it is possible to pause or stop a workflow, usually people complain if they need the resource.

The chapter ends with a miscellany of topics. Cost spikes should be investigated, they may be due increased workloads, bugs, network problems causing multiple retries, or even a DDoS attack. Another reason to reclaim resources relates to security, the more resources available, the bigger the footprint for any potential attack. Unused subscriptions can be a source of unused resources, for months or even years. Subscriptions can be rationalized, and resources moved to other subscriptions, this should simplify future analysis. 

This chapter provided a useful insight into common causes of resources that are no longer needed. The briefly discussed tools (Azure Resource Graph and Azure Governance Visualizer), should prove very useful in identifying your resources and their costs.

Chapter 6: Planning for Cost Savings – Reservations

Reserving resources often offers a quick and easy way to save money. With a commitment of up to 3 years, savings can be as high as 72%. But, due to the changing and flexible nature of cloud, some customers don’t like committing to long time periods. However, some changes are allowed within your reservations contract. Reservations can be purchased via the Azure Portal, the full amount is paid upfront. It is possible to reserve VMs, services, and disks.

The authors provide a useful list of services discounted with reservations. It’s noted that companies tend to be cautious with their first reservations, but afterwards, tend to be much more adventurous. 

A useful workflow diagram of the VM reservations process is provided, and the 15 steps are discussed (e.g. define reservations owner, change unused reservations). The Azure Advisor is often a useful first stopping point, since it can check your resources, and offers recommendations for cost savings with reservations. A simple math calculation can determine if a reservation is optimal for you i.e. monthly cost of reserved resource (1 or 3 years) / PAYG hourly cost = min hours resource needs to run for the reserved price to be better.

The utilization of reservations should be investigated regularly, and this can be seen via the charge type of ‘underusedreservations’. If the reservations aren’t being used fully, the original workflow decisions can be revisited, and changes made if necessary. Since the cloud changes often (e.g. better/faster VMs), reservations should be revisited with a view to changing to the better resources. Note that when a change is made to a reservation, the reservation’s time-period is often reset.

The chapter ends with a useful section on making changes and cancelation of reservations. Some changes are allowed within a reservation (e.g. swapping to similar VMs). Links are provided to the exchange and refund policies.

This chapter provides what is probably the fastest and easiest way to make cost savings. Something well worth investigating further.


Last Updated ( Tuesday, 24 May 2022 )