Thinking of scaling from the beginning - Why is it so important?
Whenever releasing a new webapp everyone whishes for a great influx of user when their brainchild is finally being taking the big steps to the internet. This means that the product will be exposed to the users that marketing have been able to convince to try the product.
What does this really mean for the product?
Let's consider the following scenario.
Startup Dreaming is nearing release of their new fitness app, which is sure to revolutionize the fitness world with their popping visualizations and the ability to precisely and quickly optimize training schedules with the help of machine learning.
Since the dev team is busy implementing the last few features during the remaining time until release they haven't yet set up the servers needed. When the time is out, some more time is "created" so the product can become ready for the Release Day. Servers are set up, model training is done, services deployed, databases configured, static assets are uploaded, some load tests are run to verify that Marketing estimations can be handled by the platform and every one takes a final good look at the product and takes a deep nervous breath.
Then Release Day cometh.
CEO, popping the cork of the champagne - Wohoo, fantastic work everyone! We beat our most optimistic estimations by a hundredfold! Let's party!
The dev team is fighting with their nails, hammering their keyboards, frantically trying to get new servers up to distribute the load from the unanticipated amount of users. Getting the hardware, provisioning the servers, downloading the code and then finally running the scripts needed for deploy.
After spending hours on adding new servers to the loadbalancers, slowly the response times are getting better, the average load on each server is getting lower and the team starts to catch their breaths. The developers start going home to finally get some sleep after all the sleep depravity the've endured these last couple of months.
The following day the traffic has dropped considerably so the servers are running at roughly one-fifth of their capacity. But if the dev team tears down the servers then they will have to add them again later when the load (hopefully) increases again.
This scenario is most likely not so common and mostly do not happen over night but rather a couple of weeks or months.
Sohow should one more precisely anticipate load on your system so that the devops or ops (or whomever in charge of production systems) don't need to add/remove servers from the platform all the time?
Answer: You don't.
From the start the platform should be able to provision new servers and/or add additional hardware when load increases and then do inverse whenever the load decreases. This used to require a lot of time to setup but since a few years back this can be done fairly easily.
The actual solution: Docker, Kubernetes and AWS.
If you want to have a cost efficient platform which can handle peaks and lows in traffic automatically then packaging your application as Docker Containers which then are scheduled and deployed on a AWS hosted Kubernetes cluster.
How we do this will be detailed in an upcoming article!