One of the things that keep online shop owners awake at night is – will my website withstand the Black Friday traffic? As this is one of the most important days of the year, a downtime of even a few minutes can translate into thousands of dollars in losses.
This is why we’ve decided to come to your aid with a hands-on article where we discuss the most common Black Friday Problems eCommerce websites should avoid, and you can avoid them. Below, you’ll find an exhaustive technical checklist that will help you prepare your infrastructure and avoid the most common issues.
- Perform an infrastructure discovery analysis
- Identify bottlenecks using stress tests
- Write personalized stress tests
- Tune in your system to make sure your infrastructure will not fail
- Optimize resources
- Optimize configurations
- Optimize applications
- Scale each component of your infrastructure
- Draft a pre-mortem
Black Friday technical checklist
1. Perform an infrastructure discovery analysis
The first step is to do an infrastructure discovery analysis – map out your servers, their current size, components, projects, and applications. Additionally, you can also look at the associated costs to get an idea of how much money you’ll need to spend to prepare your technical infrastructure for Black Friday.
Once you’ve drawn a diagram, it’s time to identify possible bottlenecks, meaning the components that might fail due to the sudden increase in traffic.
2. Identify bottlenecks using stress tests
A stress test is an analysis conducted under unfavorable scenarios that helps you assess whether your IT infrastructure has sufficient mechanisms in place to withstand the impact of adverse developments. Simply put, a test aimed at all production infrastructure points.
Ok, so why should you perform it? Because you need to know how much your infrastructure can handle and how much you need to scale it to accommodate the Black Friday traffic.
|Note: you’ll never want to stress test on your live infrastructure. Instead, you need to recreate your current infrastructure in the cloud and apply your tests on that instead.|
✅ Operating system (limits, IOPS, resources)
✅ Cloud resources (disk sizes, database, web servers, PHP, FPM)
✅ Configurations (nginx, apache, memecache, PHP, MySQL, stand-by processes)
✅ Anything else you might be using
3. Write personalized stress tests
To make sure you’re replicating the situation as accurately as possible, try to mimic your users’ behavior as best you can. To do that you can:
To perform a stress test, you need to:
- write tests specially designed for your applications, servers, and projects,
- configure a test infrastructure in the cloud,
- apply those tests on your test infrastructure,
- and draw your conclusions.
Keep in mind that you should try to mimic your users’ behavior as best as possible, otherwise your stress test will not be accurate.
4. Tune in your system to make sure your infrastructure will not fail
If you’ve done everything correctly, you should see that, under certain conditions, various components of your infrastructure will start to fail. This is when tuning comes into place.
The first component you should look at and tune is the operating system:
- does it have any limitations?
- are you close to your IOPS limit? (if yes, then you should increase it either by changing the disk or the machine)
- do you have sufficient resources the OS can use? (though, sometimes, your OS can crash even if it has more than enough resources)
Once you figure this out, you can start looking at your cloud components.
One thing you should keep in mind, though, is that you need to perform a new stress test after every change. If your infrastructure passes the stress test, you can keep the changes. If not, you should revert to the last working instance and start tuning again.
5. Optimize resources
A few optimizations you can make at the resource level are:
- if your database is close to or reaches the IOPS limit, you should either change the disk or the machine (each cloud provider will have its own limitations);
- for your web servers, check if you reach your RAM limit, and increase it.
Don’t use autoscaling! During stress tests, you want to figure out just how much your infrastructure can handle before you need to scale. Relying solely on autoscaling on Black Friday might not be reliable enough.
6. Optimize configurations
Once you’ve optimized your resource usage, you need to move on to optimizing your infrastructure’s configuration. This includes all the technologies you use, like nginx, apache, memecache, PHP, etc.
First is the configuration of your web server, PHP, and MySQL. Take a look at your web servers – more specifically, at the number of workers (PHP, FPM), stand-by processes, idle workers, and connections (MySQL). Then, repeat these steps for your database server.
7. Optimize applications
Then comes the configuration of your application. You need to take a look at your MySQL queries and see if any of them are lagging. If yes, you should enable slow-log and see what causes the delay and eliminate the cause.
Stress test key takeaways
Once you’ve finished optimizing your infrastructure at the resources, configuration, and application levels, you should perform another stress test to see if all changes are compatible with one another. Ideally, your infrastructure should pass this test, otherwise, you’ll need to take each component one by one and repeat the process.
While performing this “roundup” stress test, there are 2 things you should keep in mind:
As a general note, if a web server can handle 100 users per second, this doesn’t mean that it can manage a sudden increase from 0 users to 100 users. It’s often precisely because of this that e-commerce websites crash – because, even though they’re designed to handle hundreds of thousands of users at the same time, they’re not prepared to manage spikes in traffic.
8. Scale each component of your infrastructure
You’ve mapped out your infrastructure, identified possible bottlenecks, and came up with solutions for each of them. Now it’s time to take each component one by one and see how you can scale it in case it needs to handle a bigger load then you previously thought.
For your database, you can consider a master-slave or a master-master architecture.
In a master-slave architecture, you need to decide whether your application will be able to read from a slave (you scale reading). In a master-master architecture, on the other hand, you’ll scale both reading and writing.
The scaling of your web server is fairly easy – you just need to enable autoscaling. What you need to keep in mind is a minimum number of instances and configuring your autoscaling for rapid scaling (meaning, instead of scaling with one instance, at a time, scale with two or more).
You also need to test how fast new machines are added to your infrastructure. The longer it takes to add new servers that can share the load, the higher the chances your website will crash. Depending on this speed, you’ll determine how many new servers you’ll need (scale-up number).
9. Draft a pre-mortem
Even after following the previous steps, your IT infrastructure is not 100% foolproof. So take a step back, look at the whole picture, and try to identify what could go wrong – which component is most likely to fail?
Brainstorm with your team and try to come up with solutions. If you don’t have solutions available (for example, you can’t prepare a backup plan for when your cloud provider is experiencing an outage), the least you can do is prepare your other departments (like marketing or support) to respond in case the worst-case scenario becomes a reality.
Tips & tricks for efficient stress tests
Tip 1: You need to choose the right stress test tool for you. A tool that is not able to emulate the scenario you expect on Black Friday, is a tool that you can’t rely on.
Tip 2: Always personalize your stress tests. Different websites have different user behaviors – maybe your users browse for a long time before they add products to their cart and check out, in which case you need to optimize your website for high traffic.
Or maybe they add all the products they want to their cart the night before, and on Black Friday they just log in and pay – in which case you need to optimize your database.
Tip 3: Use an advanced monitoring system capable of assessing each component’s performance in real-time, and log aggregation to catch all errors (so you can fix them later).
Tip 4: Write down your results after each stress test, associate them with different configurations, and see what issues are solved and what you still need to work on.
Costs associated with scaling your infrastructure for Black Friday
As we’ve mentioned at the beginning of the article, performing an infrastructure discovery analysis helps you determine how much you’re spending on your current infrastructure. After figuring out how much you need to scale it, you’ll be able to estimate your Black Friday scaling budget.
Don’t forget about security
You’ve spent so much time and money preparing for this day, it would be a waste if everything went down the drain because of a DDoS attack. Make sure that you also install and configure a Web Application Firewall (WAF) – and include it in your stress tests at the application level – as well as a Rate Limiting for your web server.
For a more efficient configuration, you can install these two tools on an external component (like Cloudflare), instead of directly on your web server.
If you want to make sure nothing takes you by surprise, draft a pre-mortem – try to think of every possible scenario and come up with a solution. Granted, some issues are completely out of your control, but at least you’ll know about them beforehand.
Follow the above mentioned technical checklist that will help you prepare your website for Black Friday.
In case you’d like a specialized team to help you with your infrastructure, check out how Bunnyshell helps eCommerce websites to prepare for Black Friday.