1. Business case
Over the last few years we were asked by our clients to provide them with scalable and secure open source solutions. These are for growing companies who cannot afford to bend or break under traffic / heavy loads of transactions or even spikes.
In this blog we will discuss the scalability aspects of an open source solution, in particular a WordPress based solution.
The business case is for a client implementing a b2b marketplace solution in the niche of scientific equipment / lab equipment / optical equipment. Their clients are buyers and sellers around the country from Universities, Government Agencies, Corporations to medical and optical devices manufacturers etc. As far as example of the products on the market place you can think different kinds of microscopes but in fact much more than that.
Here we are talking a catalog of about 15,000 SKUs and growing. The company owns a warehouse with equipment themselves, but they are just starting in the online business. They want a solution that will be able to scale up to the expected (and almost guaranteed based on existing pre-signed contracts and agreements with different Institutions) increased amounts of traffic they will incur in 12 to 24 months after the initial solution is released.
They want something super solid and scalable designed from the beginning. They do not want to re-design core architecture or fix many things on the fly.
In this article we will not discuss why we chose to address this business case with a Word Press with wooCommerce solution. There were pre-determined business decisions that drove that technological path.
There were also many other business related requirements for this solutions in terms of usability, catalog presentation, upsell and cross sell capabilities, implementation of the latest b2c techniques in the b2b world, social media integration, integration with a CRM, mobile capabilities but we will not discuss any of these in this article.
Here we are going to focus on the scalability aspects of this platform.
The solution we are discussing here requires the system to be able to handle large amounts of traffic, large amounts of simultaneous users on the system performing simultaneous transactions.
We will require to be able to have up to 2 Mil. users on the system and up to 1 Mil. concurrent transactions at peak time.
The payload of the average record has to be in line with the detailed technical requirements. I.e. e-commerce stores with large length average records and a lot of details on their products have to be supported as well.
The system has to graciously act under traffic peaks, without crashes or significant slowdowns and more generally without negatively impacting the shopping experience of the customers.
The system also has to have the capability to self-re-scale as the traffic goes down in order to optimize cost with hardware resources.
The system also has to offer consistency and reliability features especially in terms of keeping the data sync-ed up when multiple concurrent users perform changes on the system at the same time.
High Availability will be a derivative consequence of designing this type of a system as a solution that horizontally scales will by definition offer HA capabilities.
There are other requirements as well which we will not discuss in details here, essentially:
a) the ability of the potentially failed transactions to automatically re-process
b) stateful failover at the http/https sessions level
c) keeping the complexity of the deployments and the time required to admin the solution under control
3. High level architecture
We are recommending the following architecture. This is done with two MySQL instances and two levels of PHP APIs, however it can be done scaled up to multiple MySQL instances and multiple levels of PHP APIs.
This architecture strictly depicts the application level scalability and does not offer infrastructure related details (i.e. WAN/LAN aggregation, Linux clusters etc.).
Fig. 1 Example of scaled architecture
4. Architecture details
Any scalable solution must implement a few very specific mechanisms.
They generally are:
a) Cache and Reverse Proxy
A page caching plug-in is an important software component that will deliver you performance in page loads. There are many available caching solutions on the market including Varnish (https://varnish-cache.org/), Nginx (https://www.nginx.com/) or Fastly (https://www.fastly.com/).
However, there are situations when caching only will not be enough to support large amounts of data exchanges and to allow for basic database admin / web pages admin operations at the same time. In these scenarios it is recommended to deploy a reverse proxy component to the solution.
The functionality of the reverse proxy is similar of a caching machine except the performance its much higher due to its internal architecture. Imagine a request coming thru. The request is passed to WP and a copy of it is stored in the memory. Next time when a request for the same page comes thru it is served directly from the memory.
A reverse proxy delivers much more speed than a web server (such as Apache with PHP code) because web servers generally have to perform costly operations (such as rendering code to build the page, executing scripts and instructions, connecting to the database etc.) whereas all the reverse proxy has to do is to serve a number from a cached memory zone.
b) Database indexing
WP’s built-in content search index is known as being a weak spot in the architecture. It runs slowly when the number of posts go up and they don’t even have to be concurrent posts! It also does not have any built-in mechanism to drill down.
We recommend using a dedicated search index such as Apache Solr ( http://lucene.apache.org/solr/ ) or Elastic Search.
Dedicated search indexes expand the functionality and performance of your e-commerce solution beyond the point of basic content search. Developers will be able to handle large amount of data and implement complex/costly queries without compromising the performance.
c) Persistent objects caching
When transactions to your e-commerce solution generate many database queries, your resources could come under pressure. If your database is overloaded, the whole system will start exhibiting performance issues including pages load slowdowns or even crashes.
The idea is to “protect” the database from being intensively queried especially for repeated / similar requests. To do this, chunks of data from the database will be stored in the memory and served directly from there. The software component in charge with this type of functionality was initially supposed to be part of the Word Press distribution. However, the WP itself proved to be inefficient in storing data from the database in the memory and to prevent un-necessary queries.
The open source community designed more advanced persistent object caching components. They are currently available as plug-ins. One of them is called Redis. We recommend deploying Redis (https://redis.io/) or Memcached (https://memcached.org/) with pretty much any kind of e-commerce solution that you think will need to scale at some.
d) Horizontal scalability
At the end of this paragraph we will also discuss the notions of horizontal / hardware-based scalability and high availability (HA).
Of course one way of scaling up any LAMP solution (wooCommerce/WP included) is to add hardware to it. In the diagram up above, we showed an example with two database servers and two PHP servers. The requests coming from the Cloud are round robin-ed using a standard L4-L7 load balancer such as a F5 box (http://f5.com). At the the application layer, the load balancer will send consequent sessions “inside” to multiple machines running identical e-commerce code configurations, web pages and databases.
Real world architectures are more complex than this (they usually involve pairs of load balancers and more nodes in the cluster) but that’s the idea: you want to distribute the peaks of traffic on multiple physical machines to “ease up” the load.
This type of architecture will also deliver HA (High Availability): if a PHP application or a database will go down or becomes unavailable the other one in the cluster will automatically pick up.
Load balanced environments present a number of challenges and opportunities out of which three of them are preserving data consistency, totally elimination the single point of failure and implementing state full data failover.
In this article we will only discuss consistency. Consistency requires that changes in configuration and/or code of one of the nodes to be identically propagated to the other nodes without having an impact on the e-commerce solution and functionality and with zero downtime in code deployment.
As a DevOps Engineer / Release Manager, I often had to deploy changes like these. A standard deployment procedure goes like this:
– you take node #1 out of the cluster
– you deploy the new code and/or configuration changes on node #1
– using a short script, you put node #1 back in the cluster at the same time with bringing down node #2 from the cluster
– you deploy the new code on node #2
– you put node #2 back in the cluster
If the architecture is more complex there are obviously extra steps.
Some programmers and sysadmins shy away from scalable architectures as they usually involve extra layers of configurations and manual procedures plus they can increase the complexity of debugging their code when a code related issue arises. But the reality is that a lot of these configurations are nowadays supported by Cloud Hosting companies, so you do not have to do them on your own. Another aspect of these: if your client has a real business, the traffic will be intense, performance issues or down times will not be an option and you will need to deploy a scalable solution anyways.
5. Hosting plans
Most of the existing cloud hosting companies provide scalable infrastructure and services within their own hosting plans.
For the sake of this article let’s discuss four major hosting providers and let’s offer some details in terms of their scaled plans.
Amazon offer an Elastic Search service that covers most of the functionalities described up above. The service is easy to deploy and manage, easy to scale up and down, secure and highly available. Cost wise, the service is cost effective for the range of applications and case studies it is offered for. The service also offer an initial free tier.
Under the category “Business Hosting” GoDaddy offers a scalable stand-alone solution for intense e-commerce sites including sites with wooCommerce and WP. While the architecture of this plan is on a VPS (Virtual Private Server) and it offers a generous amount of memory and multiple core CPUs we currently do not have the confirmation this architecture is elastic or horizontally scalable.
Microsoft offers a scalable e-commerce solution called “Web Apps”. As we can see in the picture on the link up above, the solution implements most of the mechanisms discussed up above including Redis, CDN and queued storage.
Under the name Google Cloud Managed Services, Google offers a scalable and fully manage-able plan. Its open source architecture is very similar to the one we described up above. On top of that, their architecture also offers built-in co-location and distributed networks mechanisms altogether with tools for DevOps and tools to maintain security and data consistency.
6. Smoke testing
Most of the smoked testing I performed in the past consisted of writing test scripts to simulate heavy traffic and to see how your system acts under stress.
Test different kinds of scenarios:
– transitions from slow to peak
– transitions from peak to slow
– modifying database structure under traffic
– different users committing changes at the same time and under traffic
– simulate performing bulk check-outs with payments
– take down one node of the cluster and see how the performance of the system is affected
– simulate different types of scenarios with weight loads on the load balancers “legs”, i.e. instead of running pure 50%-50% ratios of traffic, run 60%-40%, 70%-30% or even 90%-10%. You can even configure pure failover/HA scenarios with 100% to 0% see if the system behaves as expected! (do not do that under traffic peaks)
In terms of number of simultaneous transactions go up to 1 Mil. transactions but also test with hundreds of thousands and tens of thousands of transactions see how it works.
7. Final considerations
Key elements to consider when designing a scalable solution are:
– designing the solution with scalability in mind from day #1
– choosing the right hosting plan
– assigning the right staff to admin the solution
– working with the latest version of all the components involved in the solution
– cost (always calculate a monthly TCO for this solution)
Questions? Have a scalable solution that you need to put in place?
Give us a call at 305-910-1572 or visit http://wittywebnow.com!
Make it a great day!
In order for me to write this article I used bibliographical resources from the following sites:
I also want to thank Ecessa (former Astrocom Corp @ https://www.ecessa.com/) for the opportunity to show my expertise and to work in L4-L7 load balancers and other intelligent traffic appliances.