An introduction of Money Forward AWS guardrail system – A-SAF

Introduction

Hello, I’m Mengyuan Wan from the Money Forward CISO Office. I have been responsible for the planning, development, and operation of the AWS security control system “A-SAF (AWS-Security Alert Forwarding system).” Now that the system’s operation has stabilized, I would like to introduce A-SAF and share some insights on security control in cloud environments, including AWS.

Motivation for Creating A-SAF

When it comes to cloud security, terms like CSPM (Cloud Security Posture Management), CWPP (Cloud Workload Protection Platform), and CNAPP (Cloud Native Application Protection Platform) might come to mind. The A-SAF system we are introducing today is not a replacement for these tools but rather a bridge between security personnel and product teams.

Unlike CSPM, traditional security products such as EDR (Endpoint Detection and Response), WAF (Web Application Firewall), SIEM (Security Information and Event Management), and IDS/IPS (Intrusion Detection System / Intrusion Prevention System) are typically managed by a security team (including internal and external SOCs (Security Operation Center)), who monitor alerts centrally and coordinate with infrastructure or product teams after triage. However, from the perspective of shift-left and product security, issues like container security, source code security, supply chain security, cloud security, and credential management cannot realistically be managed centrally by security teams alone.

When a company is small, with only a few AWS accounts and products, the security team can manage CSPM detections centrally. However, as the company grows, managing alerts becomes extremely challenging, especially for a company like ours with over 400 AWS accounts. In traditional SOCs, security teams are well-versed in the details of alerts, but they often lack the context to understand whether an alert in a product or cloud setting is intentional, which is something only the product team can determine.

To address these challenges, A-SAF was developed. The system allows product teams to actively triage the numerous vulnerabilities detected during the development cycle, while the security team focuses on review and support.

How A-SAF Works

The following diagram provides an overview of A-SAF. While the system also includes a feature for weekly digest notifications, we will omit that here.

[Figure] Overall Architecture

Alerts from GuardDuty, Config, IAM Access Analyzer, and other sources aggregated in Security Hub are forwarded to Jira using Lambda. Simultaneously, notifications are sent to the Slack channels of the responsible account owners. The key point here is the collaboration between security and product teams. When product team members receive notifications, they resolve the Jira tickets.

[Figure] Workflow for Closing Alert Tickets

[Figure] Workflow for Ignoring/Suppressing Alert Tickets

Reasons for Component Selection

A-SAF is composed of four main components: alert sources, processing, notification, and ticketing. The reasons for selecting each component are as follows:

Alert Sources

Security Hub was chosen as the alert source because it is AWS’s native CSPM service, which can easily aggregate alerts from other AWS security services and standardize the detection results into a unified data format (ASFF).

Processing

The processing component corresponds to the Controller part in MVC (Model/View/Controller). The alert messages flowing from CSPM are processed using AWS’s SQS and Lambda.

Notification

Since our company uses Slack as its internal communication tool, notifications are sent via Slack Bot.

Ticketing

Although Security Hub includes features like security scores, dashboards, and basic ticket management, it is not a specialized ticket management system. Therefore, we chose Jira for ticketing. The benefits of Jira include:

  1. Workflow-based control of collaborative tasks
  2. Serving as the Model (DB) in MVC
  3. Serving as the View in MVC

Alternative Components

We are not particular about which products or technologies to use and prefer to avoid vendor lock-in, so we have considered several alternatives. For example:

  • Instead of Security Hub, open-source or paid third-party CSPM tools can be used as the alert source.
  • For processing, Google Cloud’s Cloud Pub/Sub and Cloud Functions can be used instead of SQS and Lambda (in fact, our SAF system for Google Cloud environments uses these).
  • The notification system does not have to be Slack; Teams or other tools can also be supported.
  • For ticketing, Notion or Asana can be used.

Results

In our company, which has over 400 AWS accounts, we initially had more than 12,000 alerts. Over six months, we managed to reduce the number of alerts by over 40%. Among them, the progress of the AWS accounts managed by a certain organization in our company was particularly smooth. Let’s take a look at various graphs as examples.

Visualization

There are over 30 AWS accounts managed by this organization.

[Figure] Trend of Resolved Alerts

The graph shows that we started with 556 alerts, but due to our system’s design, where Jira tickets are automatically deleted when AWS resources are removed, some tickets are not reflected in the statistics. In reality, we reduced the number of alerts from 728 to 132, achieving an 81% reduction over approximately six months. Not only did A-SAF help reduce alerts, but it also contributed to the removal of unnecessary resources and cost savings.

[Figure] Accounts with the Most Remaining Alerts

This graph helps identify which accounts are lagging in progress. Using this graph to apply pressure on the responsible parties can be highly effective.

[Figure] Number of Alerts by AWS Config Rule

By looking at this graph, you can see which rules have the highest number of alerts. The priority among the rules varies, and for example, if the detection count for rules related to IAM Users or Access Keys is high, it suggests that there is a proliferation of IAM Users, which requires careful attention.

Slack Notifications

Let’s also take a look at the notifications sent to Slack.

[Figure] Weekly digests are sent to each product team’s channel

[Figure] Example of GuardDuty alerts sent to each channel

As shown above, we utilize both weekly summaries and real-time notifications to keep each product team informed.

Workflow-based Review

As mentioned earlier, handling alerts is best done through collaboration between security and product teams. For non-urgent alerts, product teams take the initial action and actively resolve the alerts. However, since product team decisions can sometimes be inappropriate, Jira’s workflow restricts them from closing alerts on their own. Product teams can only resolve alert tickets, and security teams must review and close the tickets if there are no issues. Let’s look at a real example.

[Figure] Detection of Public S3 Buckets

[Figure] Product Team’s Claim and Security Team’s Correction

In this detection, the product team claimed that restricting access to Cloudflare’s IP range was sufficient and resolved the alert. However, the security team reviewed it and found that Cloudflare’s IP range was too broad and shared among multiple customers, making it risky. The ticket was reopened, and the product team was instructed to use other methods for restriction. This system of checks and balances reduces the burden on security teams and creates a scalable alert response framework.

Pull Requests Based Review

For intentional settings, product teams can submit pull requests on GitHub to ignore alerts.

[Figure] PR to Ignore Alert for Not Blocking Public Access at Account Level

[Figure] Security Team’s Review Result

The security team advised, “If you are not using public S3 buckets, you should block public access at the account level,” leading to the withdrawal of the PR. This way, inappropriate ignore settings are corrected through PR reviews by the security team.

Conclusion

Unlike traditional alert handling procedures, our system allows product teams to take the lead in investigations, with the security team focusing on review and support. This creates a scalable alert response framework. While it requires effort to gain the product team’s understanding, we were fortunate that our employees have a high level of security awareness, and there was little resistance to the new system. We encourage you to try implementing a similar system in your organization.


We’re hiring

We are hiring security specialists! https://hrmos.co/pages/moneyforward/jobs?category=1862162235189092352

Published-date