How the simulated workload looks like in a spike test
Check out an example of the workload below:
- apply some low load to the system first to make sure that everything works OK and system is warmed up
- then we cause all hell to break loose by adding a sudden, steep increase in load
- keep load stable for some time, so you could check whether system is processing it successfully and meets your requirements
- it’s a good idea to check system’s behavior after a decrease in workload as well. Lower the load and then keep it on such level for some time
Examples of real life systems where spike tests are applicable
There is no better way to show the need for something that some practical examples, here are a few:
- Opening a ticket sale for a high profile music gig or sports event – everyone rushes to get a ticket as soon as possible.
- News websites – they get a massive spike in visits once something (usually bad) happens in the world.
- Celebrities sharing stories on social media about products they use or places they visit. This could invite millions of people to the site in short time span.
- Things that can happen in lots of systems like doing a mass update, data migration through an API etc.
What should you look at and check during the spike test
Well.. as always we should check if the system fulfills the requirements and processes the workload in the correct way, but what else?
- If the system fails, it is very important to have the data on how it fails. How you are reacting to the failure? Does the system go down completely or e.g. the number of session is restricted on the server side and additional users are restricted from accessing the server?
- What happens once the load goes down – do response times go down, do the resources correctly deallocate? Does the system recover from the failure or is still throwing errors even with almost no incoming load.
- You may come to a conclusion that your system needs to be auto-scalable by automatically adding more instances to handle such workload.
- If your system is already capable of doing auto-scaling – are the additional nodes added in a timely manner? Simply speaking are you doing it fast enough to avoid the failure
- Are your instances torn down correctly and successfully when the load decreases (without any disruptions in functioning of the system).
Some other remarks about the spike tests
- Quite often a spike test scenario is not the main one that a customer may discuss with you, as it’s not the usual workload they expect. Ask specifically about such scenarios and be prepared to provide insights why it will be thoughtful to execute such test.
- Remember to plan the test diligently, to avoid simulating unrealistically steep spike of load. You don’t want to waste time and money investigating results of a scenario that totally crushed your system, but will never happen in your system in real life.
- It might be a good idea to run such test after a normal load test (with longer ramp up duration) was already run. By doing that you will have the basic performance issues already found and fixed, so you will be able to focus specifically on the issues related to a sudden spike in workload