Kolaparthi's Tech Blog: chaos Engineering

Sunday, October 25, 2020

Software Resilience Testing

Software resilience testing is a method of software testing that focuses on ensuring that applications will perform well in real-life or chaotic conditions.

In other words, it tests an application’s resiliency, or ability to withstand stressful or challenging factors.

Resilience testing is one part of non-functional software testing that also includes compliance, endurance, load and recovery testing.

Since failures can never be avoided, resilience testing ensures that software can continue performing core functions and avoid data loss even when under stress.

In today’s world, system downtime is not an option. If a user can’t access an application once, chances are that they will never use it again. Resiliency, which in simple terms is the ability of a system to gracefully handle and recover from failures, thus becomes critical.

Testing resiliency ensures the system’s ability to absorb the impact of a problem while continuing to provide an acceptable level of service to the business.

This concept was originally introduced by Netflix in the Principles of Chaos Engineering.

To build your test strategies for resilient systems, you should:

1)Conduct a failure mode analysis by reviewing the design of the system. In simple terms, this means identifying all the components, internal and external interfaces, and identifying potential failures at every point. Once failure points are identified, validate that there are alternatives to failure.

2)Validate data resiliency, i.e. that there is a mechanism for data to be available to applications if the system that originally hosted the data fails. Verify that the data backup process is either documented or automated.

If automated, validate that the automated script backs up data correctly, maintaining integrity and schema.

3)From an infrastructure standpoint, configure and test health probes for load balancing and traffic management. These ensure that the system is not limited to a single region for deployment in case of latency issues.

4)From an application standpoint, conduct fault injection tests for every application in your system. Scenarios include shutting down interfacing systems, deleting certificates, consuming system resources, and deleting data sources.

5)Conduct critical tests in production with well-planned canary deployments.

Validate that there is an automated rollback mechanism for code in production in case of failure.

Saturday, October 24, 2020

Install and Run Gremlin on Windows

Below steps need to be followed to install Gremlin on windows.

Signup for Gremlin Account using below link. https://app.gremlin.com/signup
Download the Gremlin installer gremlin_installer.msi
Run the installer by double-clicking on the downloaded file.
Windows, by default, prevents this from running, and shows a Windows protected your PC dialog box.
Proceed with the installation by clicking on More info.
This will display another button at the bottom, Run anyway. Click that button to continue.
Once the istallation done,we can locate Gremlin config file under below location.
C:\ProgramData\Gremlin\Agent\config.yaml
Signin to Grimlin account.
Go to "Team Settings" and Copy "TeamID" and "SecretKey".
Open a command prompt and run "gremlin init"
You will be prompted to enter the following values.

Please input your Team ID:

Please input your Team Secret:

Once provided with above values Gremlin will be initiated.

Kolaparthi's Tech Blog

Sunday, October 25, 2020

Software Resilience Testing

Saturday, October 24, 2020

Install and Run Gremlin on Windows

ES12 new Features

adds