Using Fargate will hide a lot of complexity, especially with privisioning the underlying EC2 instances.
Lets start by getting to ECS.
http://18.220.91.58/?uid=20
gives the following response.A Load Balancer will give a static URL and route incoming HTTP(S) requests to ECS tasks managed by an ECS service.
There are many types of load balancers on AWS, see here. We will use an application load balancer (ALB).
Using the ‘Get Started’ workflow on ECS is the easiest to set this up to work with the cluster.
The load balancer uses a VPC (virtual private cloud, an AWS terminology and service), a security group (who can access our container) and a target group (who to route requests to).
Aside: VPC essentially isolates your computing environment from the external world.
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. You can use both IPv4 and IPv6 in your VPC for secure and easy access to resources and applications.
We can also add it separately while creating the service (we will skip the details here).
You can access the public static url by navigating to the load balancer page.
First click on the load balancer on the service page shown above.
Then click on the load balancer link to the top right of the page.
Since there are a lot of steps involved (as well as quite a few moving parts), its good to revisit what our original goal is.
Our goal was to set the model prediction/deployment up in such a way that it scales and has no issues with failures.
The ECS cluster is scalable (we can add more tasks and services easily).
Further, the ECS service manages these tasks such that even if the underlying EC2 instances that run these containers fail (could be any reason) the task can be restarted on other machines to keep everything running smoothly.
Finally the load balancer, maps a static external IP to the internal container(s). So if there are multiple model prediction containers, the load balancer will use an algorithm (such as round robin) to distribute the incoming requests.
While there is a lot more work to set this up unlike the serverless solution, there is more fine grained control and visibility into the components supporting your scalable model deployment effort.