> Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.
Yes, this is very true. The model here works for Poisson arrivals and exponential service time (the M/M), which are poor approximations of real-world traffic patterns (which tend to be non-stationary and non-ergodic, and include substantial seasonality). However, the frequency of that seasonality is typically rather low (e.g. daily cycles), and so these stronger assumptions are quite defensible for short time periods.
A better approach is to do simulation with real traffic patterns, or even with more sophisticated parametric models, and get better answers (e.g. https://stability-sim.systems/). The good news is that kind of simulation is cheaper to do than ever before.
What's conspicuously missing is the plot of performance when you do have a well tuned queue in front of the service. Yes, having a queue becomes less important the more backend servers you have, but here even with 10 servers the plot shows your latency remains >25% worse than it would be with a queue.
Also missing is discussion of how the variance in processing times affects you when you rely on load balancing alone.
> What's conspicuously missing is the plot of performance when you do have a well tuned queue in front of the service.
As in between the service and the load balancer? There's already an infinite queue in the load balancer. You can try that out on https://stability-sim.systems/ to see the effect, but the short version is that (in this model) it makes things worse.
If you're saying that the queue in the load balancer should be limited in size to reduce tail latency, then I agree.
One explanation would be that more load could mean higher (absolute) variance in queue length, and therefore higher latency especially at higher percentiles. It doesn't work out that way (for reasons that Erlang actually writes about in one of his original works), but it's not an entirely unreasonable intuition.
Seemingly inconsequential article on hacker news and assume it probably is the kind of article that describes a profound idea with a naive title. And turns out it's actually very confusing as it puts overweight dramaticity over mundane intuition. Those type of writing belongs to literature sphere, not technology writing.
A dead comment says:
> Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.
Yes, this is very true. The model here works for Poisson arrivals and exponential service time (the M/M), which are poor approximations of real-world traffic patterns (which tend to be non-stationary and non-ergodic, and include substantial seasonality). However, the frequency of that seasonality is typically rather low (e.g. daily cycles), and so these stronger assumptions are quite defensible for short time periods.
A better approach is to do simulation with real traffic patterns, or even with more sophisticated parametric models, and get better answers (e.g. https://stability-sim.systems/). The good news is that kind of simulation is cheaper to do than ever before.
What's conspicuously missing is the plot of performance when you do have a well tuned queue in front of the service. Yes, having a queue becomes less important the more backend servers you have, but here even with 10 servers the plot shows your latency remains >25% worse than it would be with a queue. Also missing is discussion of how the variance in processing times affects you when you rely on load balancing alone.
> What's conspicuously missing is the plot of performance when you do have a well tuned queue in front of the service.
As in between the service and the load balancer? There's already an infinite queue in the load balancer. You can try that out on https://stability-sim.systems/ to see the effect, but the short version is that (in this model) it makes things worse.
If you're saying that the queue in the load balancer should be limited in size to reduce tail latency, then I agree.
Why would anyone think that it would get linearly worse? What's the (wrong) assumption there?
One explanation would be that more load could mean higher (absolute) variance in queue length, and therefore higher latency especially at higher percentiles. It doesn't work out that way (for reasons that Erlang actually writes about in one of his original works), but it's not an entirely unreasonable intuition.
I think author made it up just to have something more to show up on graph.
It was a poll on Twitter, do you really expect good responses?
Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.
Still, queuing theory is so cool.
Seemingly inconsequential article on hacker news and assume it probably is the kind of article that describes a profound idea with a naive title. And turns out it's actually very confusing as it puts overweight dramaticity over mundane intuition. Those type of writing belongs to literature sphere, not technology writing.