A ransomware attack causes an average of 23 days of business disruption. The average length of recovery is six days. In addition, the average ransomware payment is up by 43% to $220,000. In the age of always-on SLA expectations on businesses, longer recovery times mean not only revenue loss but also significant reputation damage and higher average ransomware payments.
Attack prevention is obviously the best approach. But the attack surface keeps changing. The attacks are getting more sophisticated. Assuming that a cyberattack has already happened, why does business application recovery take so long? Is it not about recovering all the affected servers, databases and filesystems? Or does it involve the bigger puzzle of figuring out all the configurations, dependencies and the overall state of complex business systems? How easy is it to figure out the dependencies between them? How easy is it to safeguard all of those resources and be able to recover them at a point in time, perhaps before the attack started?
There are more questions than answers. There’s a much better way to confidently and consistently recover your entire environment with all of the cloud resources, dependencies and configurations rapidly without an elaborate technical recovery plan or lengthy cloud infrastructure code.
Automate the discovery of your cloud environments.
The first secret to recovering your cloud environment is actually no secret at all. Unless you know what you have in your environment, you cannot protect the resources that are critical. Some of my company’s customers create a single cloud account and use all of the resources in one account, just like how they treated their data center resources.
Others have hundreds of cloud accounts, each with several VPCs segmented to differentiate production from development and test environments. The programmability of cloud environments makes it much easier to create cloud resources fast and, most importantly, modify them quickly to adapt to business requirements.
Based on my company’s active discovery of customers’ cloud environments, we’ve found that a typical mid-size enterprise uses about 10,000 cloud resources across its application environments. These dynamic environments make it difficult to discover and maintain consistent configuration items of every cloud resource and to confidently say you have good control over your resources.
Automating the discovery of cloud resources is the only solution to keeping up with the changes to your environments.
Automate dependency mapping.
The second secret is to understand how your resources are connected to each other. Without knowing the dependencies, you can only hope and pray when it comes to recovery time. Cloud environments are so dynamic that you should completely automate instead of trying to figure out the dependencies manually after a downtime.
Infrastructure-as-code helps if all your teams religiously follow the immutability and tight governance models. But as soon as someone in your development, SRE, cloud operations or network teams modify the configuration outside of the IaC model, you are out of luck for recovery. Automated dependency mapping to understand your application blueprint is very important for rapid recoveries.
Protect all of the cloud resources with policies.
The third secret is to protect all of the resources, configurations and dependencies, not just virtual machines, databases or filesystems. With this approach, you avoid the risk of not protecting some crucial resources. Applying protection policies based on your organization’s requirements is the starting point of the recovery plan. Some may need just daily backup. If you think a component of an application system in a distributed environment changes often, you might think of applying a policy with a much shorter recovery point.
What about application data?
Along with discovering application cloud resources for protection, it is needless to say how important it is to protect all of the application data in sync with environment resources for successful system recovery after a cyberattack. Cloud platforms offer one of the best ways to make copies and move them to different regions or even continents.
You could also protect your data in more than two regions as well. One obvious choice would be the same region as your production region with point-in-time copies, another is a failover region in the same continent and perhaps another choice is a region in another continent for the third level of data resiliency. However, keep in mind that every copy you maintain is going to cost you in cloud dollars. So it is important to keep your backup and replication policies based on the number of data copies you must really need.
Write infrastructure code to automate recoveries.
Perhaps the most important secret to guaranteeing recovery of all of the resources is to write infrastructure code using the cloud-native language. You could also use a cloud-neutral IaC to describe the steps, including dependencies and configurations. Try using a copy of your production deployment code with appropriate modifications to be able to run in the recovery region.
However, you have to understand the implications of using third-party IaC as opposed to the cloud platform’s native infrastructure language. This includes hosting and running the code with enough compute power at the time of recovery in another region, adapting the code to make sure all of the cloud resources are supported, maintaining the version of the IaC server and making sure that all the code is backward compatible.
The third approach is to manually describe the steps with your data replication tool, document the important sub-tasks and hand over the steps to a disaster recovery specialist in your organization with some level of automation code.
Gain recovery confidence with frequent tests.
Cloud resources discovery, protection and writing infrastructure code for recovery are important steps, but if you don’t have the discipline to test your process and the code, there is no guarantee of recovering your critical business applications after a ransomware attack.
Make sure you test your environment recoveries often. This is particularly important when your applications run on the shared public cloud infrastructure. A good frequency is monthly. This will help you understand your organization-specific complexities with respect to process changes, as well as any cloud infrastructure changes.
Get in touch with us to work with your shared VPC environments