When building a real-time bidding ecosystem like we do, there are many different services that make up the larger bidding system. Many bidders typically have the actual bidding service, a budgeting service to track spending budget, a campaign management service to change targeting settings, algorithm service to optimize the way it bids, audience personalization service to get the best match between campaigns and audiences and more.
At Eyeview, our real time bidding pipeline alone is constructed out of eight different services deployed on Amazon Web Services’ (AWS) Elastic Compute Cloud (EC2), and each of them require communication between one another. From our custom load-balancing solution through our real-time user datastore and everything in between. Things weren’t always this complicated, but it’s become a necessity for the business to have a (micro) service-oriented architecture that can innovate quickly, parallel workloads and become more efficient. The fact that AWS Cloud allows us to launch the right EC2 node for the right use case directed us to service separation as well. Following that, implementing service discovery was needed from day one.
Now, one might argue that using a server side load balancing method (such as AWS ELB) is the best way to go rather than client side discovery. Although this is a popular answer, a load balancer server might not answer all scenarios, such as custom management of the cluster or seed nodes required for a database client (and this might also increase your latency).
Service Discovery is not a huge challenge to have, as quite a few solutions already exist in the market. But to be honest, as early as 2010, solutions like Zookeeper were not seamless to implement; also, our team only had a handful of developers, so we didn’t have the capacity to maintain it. We decided to go simple. How simple? EC2 simple.
When you search for a service discovery solution, you start from a basic query, such as “Find all the nodes that host a service named X”. Could we do it without a service discovery solution? Yes, which is where EC2 API comes in.
EC2 API by itself is just a way to query your EC2 instances and not a service discovery on its own. However, it provides a simple set of queries that will help you reach a very similar functionality. Using DescribeInstances API call, you can get all of your EC2 instances. However, in order not to get all of your instances and only the service you want, you will want to use some filters.
Step 1: Making It Work
One very useful decision was to use Security Groups, which are normally used to define security access to EC2 instances, as a way to name our services. This was an intuitive decision for us as every service we launched also had its own security access patterns.
In order to find our bidder service, we used the following AWS CLI command:
And from Java:
This setup was great for us as a lean startup because it didn’t need any setup or required maintenance, which was a huge win.
Step 2: Adding Features
After we did the basics, we wanted to have a bit more control. Which was:
- Having availability zone (AZ) isolation
- Making sure we get only healthy nodes
AZ isolation is a best practice by AWS that makes sure that you are “staying vertical,” or that a request coming into a specific availability zone will not cross to another availability zone in order to optimize reliability (think of the AZ going down) as well as cost efficiency (AWS charges for cross-AZ traffic). In that case, we added the “AZ Isolation” piece to our query.
Getting only “healthy” nodes is cumbersome because we needed to be able to define what is a healthy node and to identify it when we queried.
For that purpose, we decided to use EC2 Tags feature. We have a system that will periodically ping the nodes for health (every service will be able to define its own health checks). Once that system decides the health status of the node, it will mark this instance using createTags EC2 API method and set a tag named “eyeview-healthy” with a value of 0 (for unhealthy) or 1(healthy). From then on, we just need to add another filter to our query.
Step 3: Hitting a Wall
EC2 API-based discovery was a success. This setup survived quite a lot of use and abuse through the past few years, which helped us deploy new clusters, auto-scale existing clusters, aid discovery across the system and simply quickly look at elasticfox and identify the service nodes.
In the past year, when our number of EC2 instances went up to three digits, we started experiencing issues with this setup because it is highly dependent on EC2 API being available at all times. However, EC2 API has request rate limit, which is not disclosed by AWS. At different times, we were getting throttled by the API, causing us disruption in our production workloads and new instances suddenly were not identified. We went through quite a few efforts to reduce the number of calls to the API and opened numerous support tickets to AWS to help with the issue.
Is it right for your business?
Service-oriented architecture is not a new concept, and it is adapted across all industries. Having the right service-discovery mechanism is usually not a complex decision but should be one that does not cause a lot of overhead and maintenance from your team.
I would argue that if you have less than 1000 EC2 nodes running and you are not using containers (as Security Groups are at the node level), I would definitely recommend using EC2 API-based service discovery. It is easy, does not have an overhead and is very practical at that stage.