SaaS Deep Dive

Inside a scalable, efficient multi-tenant architecture

SaaS customers will not wait for you when you can't scale. They'll leave for a competitor and it's hard to get them back.

Dilemma

Availability

Multiple tenants on the same application can go down at the same time. We need to be highly available.

Cost efficiency

We can overprovision a bit, but money isn't infinite. Business doesn't want to spend money, but wants all the features.

Predictability

We don't know the behavior of customers all the time.

Deployment models

Do we have "premium" customers with their own stack or do all customers share underlying infrastructure?

Tier experiences

The business wants different experience based on the type of customer.

Designing for scale

First thing to do, is broaden your view of scale. That's either using bigger instances or using more instances. What makes SaaS, is owning your contol-plane, this also needs to scale. As your business scales, not only your core application scales, but also the supporting applications.

On one side, we have a variety of workload profiles like "shared", "specialised" or "dedicated". On the other hand, we need scaling options based on the profile.

The simplest view of scale is just give each customer the entire stack and scale individually. The reality of scale is each customer does not always need the entire stack. Scale starts with understading workload profiles, start by capturing some usage metrics to understand what the customer is doing.

Compute influences scaling strategies, starting a new EC2 takes some time, which results in a slower user experience. You can solve this by overprovisioning a bit, if you run all your tenants on a single pool, this is not that bad. Scaling on Lambda is easier, you don't have to think about a scaling policy, because Lambda will handle it for you. For scaling containers, AWS has a reference architecture.

Combining horizontal scaling and sharding, you can achieve the scenario of tenant pods. Each pod contains a collection of tenants, this optimizes usage and predictability. The downside is that this brings more complexity. What happens when you need to move a tenant to another pod?

You can use a general instance type for all workloads, or you can choose the right instance type based on workloads. This also introduces complexity. So is this complexity worth it? Tools like Karpenter help you achieve this.

Links

https://karpenter.sh/