We’re frequently asked how we operate our Security Operations Center (SOC) at Microsoft (particularly as organizations are integrating cloud into their enterprise estate). This is the first in a three part blog series designed to share our approach and experience, so you can use what we learned to improve your SOC.
In Part 1: Organization, we start with the critical organizational aspects (organizational purpose, culture, and metrics). In Part 2: People, we cover how we manage our most valuable resource—human talent. And finally Part 3: Technology, covers the technology that enables these people to accomplish their mission.
Microsoft has multiple security operations teams that each have specialized knowledge to protect the different technical environments at Microsoft. We use a "fusion center" model with a shared operating floor, which we call our Cyber Defense Operations Center (CDOC), to increase collaboration and facilitate rapid communication among these teams. Each team manages to the specific needs of their environment.
In this three part series, we focus on the operation of our corporate IT SOC team as they most closely reflect the challenges and approaches of our customers—having many users and endpoints, email attack vectors, and a hybrid of on-premises and cloud assets. In addition, we include a few lessons learned from the other SOCs and our Detection and Response Team (DART) that helps our customers respond to major incidents.
This SOC operates with three tiers of analysts plus automation as seen in Figure 1 below. (We’ll provide more details in Part 2: People.)
Figure 1. SOC analyst tiers plus automation.
The tooling in the SOC (Figure 2) is a mixture of centralized breadth capabilities and specialized tools to enable high quality alerts and an end-to-end investigation and remediation experience. (Part 3: Technology will provide more details.)
Figure 2. SOC tooling.
Like all things in security, our SOC has evolved considerably over the years to its current state and will continue to evolve. We recently noticed that our SOC had sustained a 100+ percent growth in incidents handled over the past three years with a nearly flat staffing level. While we don’t know if we can expect this astounding trend to continue in the future, it validates that we are on the right track and should share our learnings.
The first element we cover is the value of the SOC in the context of the overall mission and risk of the organization. Like the traditional incarnations of crime and espionage, we don’t expect there will be a straightforward “solution” to cyberattacks. A SOC is often a crucial risk mitigation investment for an enterprise as it is core to limiting how much time and access attackers have in the organization. This ultimately increases the attacker’s cost and decreases the benefit, which damages their return on investment (ROI) and motivation for attacking your organization. Everything in the SOC should be oriented toward limiting the time and access attackers can gain to the organization’s assets in an attack to mitigate business risk.
At Microsoft, our SOCs bear not just the responsibility of reducing risk to our employees and investors, but also the weight of the trust that millions of customers accessing our cloud services and products put in us.
We’ve learned that the SOC has four primary functional integration points with the business:
If you take one thing away from this post, it’s that the SOC culture is just as important as the individuals you hire and the tools you use. Culture guides countless decisions each day by establishing what the right answer looks and feels like in ambiguous situations, which are plentiful in a SOC.
Our cultural elements are very much focused on people, teamwork, and continuous learning and include these learnings:
The final organizational element is how we measure success, a critical element to get right. Metrics translate culture into clear measurable objectives and have a powerful influence on shaping people’s behavior. We’ve learned that it’s critical to consider both what you measure, as well as the way that you focus on and enforce those metrics. We measure several indicators of success in the SOC, but we always recognize that the SOC’s job is to manage significant variables that are out of our direct control (attacks, attackers, etc.). We view deviations primarily as a learning opportunity for process or tool improvement rather than a failing on the part of the SOC to meet a goal.
These are the metrics we track, trend, and report on:
Our biggest recommendation for the SOC organization is to define the culture you want to inculcate. This will shape your team and attract the talent you want. In the coming weeks, we’ll share our philosophy on managing people, career paths, skills, and readiness, and what tools we use to enable our people to accomplish their mission. In the meantime, head over to CISO series to learn more.
The post Lessons learned from the Microsoft SOC—Part 1: Organization appeared first on [Microsoft Security.