Building a cloud-native application often requires us to focus on multiple aspects, in a holistic manner. The creation of just a working application is not good enough to call it a well-built application. In the past, the focus had always been more on functionality than on the critical aspects surrounding the application. Giving enough importance to these activities can quickly become our differentiator in building a resilient application.
Scalability & Security – two such critical aspects when building a software application, deserve their due respect from the design phase itself.
A state-of-the-art application was built to administer vaccinations across nations – bug-free! The primary purpose was met! The superset of features that the system provides will cater to the needs of varying audiences across the globe, with configurable parameters. All aspects of the system, including registrations, notifications, analytics, reporting were all built using latest technologies such as HTML5, Angular, Node.js, PowerBI, SQL and NoSQL DB’s, using microservices architecture that is hosted in AWS. This included many services, including S3, ECS with Fargate, ECR, Cognito, ALB, Redis Cache, SNS, PinPoint, API Gateway, and sundry other services.
During the initial phases, multiple discussions were held internally with architects who have deep architectural knowledge and were validated with AWS against a well-architected framework. This resulted in healthy critiquing of various services chosen, including those that were purposefully omitted. In general, architecting a system is more of an art, as decisions could be made to use either cloud-native services, or open-source systems, or a combination, including those available on AWS Marketplace or elsewhere. Hence, a solid understanding of not only existing services but also their alternatives is needed to come up with a robust best-of-the-breed approach, to make the system scalable and secure, while keeping the costs at bay.
Focus on scalability is a multi-pronged approach, including a selection of ECS with auto-scaling enablement. ECS was chosen instead of EC2, and within ECS, Fargate was chosen as an alternative to EC2. The serverless approach was part of cloud strategy and is a step towards building scalable architecture. Appropriate limits were set up in ECS, so the instances would scale in/out automatically based on pre-set parameters, such as CPU utilization. Containers were created using Node.js code and they were developed as microservices, with ECS as the orchestrator of these services.
Every aspect of the system was looked into in greater detail, to ensure the system warms up and scales to the needs of the customers when demand surges. It is quite important to ensure the system scales in as well, when there is a slump in demand, to reduce overall TCO. To ensure optimal performance at this scale, CloudFront was used as CDN and the Redis Cache engine was used to cache database transactions. Over a period, we observed that over 60% of hits were from Redis than from the database.
On the other hand, the security of the system is of paramount importance. Any weak link will be exploited in no time. Security from all aspects should be tackled – starting at the account level, with the principle of least privileges in mind, all the way to defending against DDoS attacks. Web Application Firewall (WAF) was deployed to protect the system from vulnerabilities. At the network level, the entire application was hosted in private subnets, with only one public subnet containing a bastion host.
NACL’s and security groups were created to allow only the expected traffic at various ports for specific protocols. An API Gateway was configured with appropriate rate limits, to ensure proper management of API calls to the containers hosting the application code. This gives us flexibility in restricting traffic, while the load balancer helps us distribute the traffic via HTTPS, for a secure data transfer. Other services, such as AWS Shield was used to protect against DDoS attacks. GuardDuty was used to protect accounts, sensitive data and for threat detection, while AWS Inspector helped in protection against vulnerabilities and for compliance reasons.
KMS helped us manage keys for application to connect to various services in a secure way. All HTTP requests were routed to HTTPS, to ensure data in transit is always encrypted. Encryption at OS and Database level was turned on, to take care of encryption of data at rest. In addition to this, many application-level fixes were done to ensure the security of the overall application.
Having a secure application also calls for solid instrumentation in place. Proper logging and monitoring should be part of the overall design. Logs were made accessible only to those that are authorized to view them. They contain information pertaining to network, application and service levels. VPC Flow Logs were enabled to monitor network activity, while application-level logs were refined not to store sensitive information and their access was controlled using IAM roles. Similarly, service level logs were enabled to provide the health of various services and how they behave under various circumstances, to give complete insight into the functioning of the application. CloudWatch was closely integrated, and proper alerts were created across the system to notify the support team through emails, using lambda functions, so the team can proactively work on remedies before they reach their actual limits.
There had been some learnings, including the need to collaborate with the AWS team at times, when soft limits on certain services ought to be increased etc. Similarly, based on the need of the application, we had to choose a combination of native and third-party services. It is sometimes worth resorting to third-party services, though they might become a bit expensive, to meet certain goals of the overall system. Implementation of automation where possible, certainly in the areas of DevOps and IaaC, is non-negotiable. We all would run into situations where there will have to be multiple features released in short time and we end up seeing the need to spin-off different environments for various stakeholders, even for shorter periods. Early prioritization of DevOps, IaaC, monitoring and instrumentation gave us flexibility to try a few new features and to quickly integrate third-party services.
Laser focus on the above aspects from the design phase itself, on top of developing a functionally qualitative product, resulted in a highly scalable and secure system. In fact, the application was later certified by independent and professional testing agencies that it was both secure and scalable to all foreseeable customer-base, for which it was intended to be used, thus vindicating the overall design and implementation of the system!