Evolving Cybersecurity: Moving Toward a Data Aware Model

The following is a paper written in November 2020 by Michael Conlin, Chief Business Analytic Officer, Department of Defense and Chip Block, Chief Solutions Architect of Evolver

We are currently living within a network-driven cybersecurity model. This model was designed to secure communications at levels that protect the underlying data, by hardening the perimeter and looking for intruders. The model originated in a simpler time. Most of the enterprise’s data and applications were isolated behind a perimeter. The perimeter could be identified and defended. Few users, and even fewer applications ‘poked their noses’ above the enterprise firewalls. Intruders were relatively easy to spot and interdict.

Today, however, connected systems are exploding and there is no network perimeter. IoT devices, cloud computing and other advances have created a continuously changing computing environment. Mobile users and teleworking from home are now the default, with many employees performing enterprise work on personal devices. Internet and social media sites are generating large volumes of data on consumers and citizens from outside the perimeter. Open data is now widely available and consumed. The edge of the enterprise has become so porous there isn’t a perimeter to speak of anymore.

Intruder Alert!

There are two types of CISOs:

1. CISOs who have been hacked and know it;

2. CISOs who have been hacked but don’t know it.

Looking for intruders has become equally problematic. Cyber professionals feel compelled to constantly collect and share log data on an almost near real time basis. The common method is to collect log data from devices, applications, users, etc. and to aggregate massive files into an analysis system such as Splunk . From this aggregation, analytic methods are applied to detect anomalous behavior which is an indicator of a threat. In other words, our current approach to protecting our data is to create a massive amount of additional data that must also be managed and protected. In pursuing this model, we are moving large amounts of data from sensors to analysis systems to data aggregation locations to other analysis tools, all for the sake of finding the anomalous activity that indicates a threat. The cost in compute, storage and bandwidth is enormous. The sheer volume of data degrades the signal-to-noise ratio to the point the signal – traces of the intruder – is washed out. Additionally, the result is a fragile architecture that can be defeated with fairly simple disruptions to elements within the environment.

Rule of thumb on data pollution

1B → 1KB → 1 MB

Every 1 byte of data payload (written to storage or read from storage) generates 1 KB of traffic on the data center LAN and 1 MB of traffic on the WAN.

Originally, the number of attack patterns by attackers was limited. Today, attackers have developed a wide range of attack vectors, utilizing everything from script-kiddie hacker sets to sophisticated Machine Learning algorithms. These innovations don’t just make detecting their actions difficult but in many cases undetectable within common log files. Further, almost all attack vectors now include the deletions or modification of logs by the attackers.

There are many efforts underway to make the current model perform better. They include improving log collection tools such as Splunk to store data better or creating better dashboards using Elasticsearch to better analyze the large amounts of data. Artificial Intelligence and Machine Learning projects are focused on detecting anomalous behavior using the large data sets. Though there are improvements, these efforts have not delivered significant improvement in performance. The old cybersecurity model doesn’t work anymore; we need a new one.

Here’s what the new cybersecurity model looks like

Wait, first tell me why I should care!

I know what you’re thinking. You’re thinking, “First I want to know why this new model is worth considering.” So let’s explore the sources of value:

Increased protection of highest value mission assets
Improved efficiency in detecting and responding to cyber attacks
Reduction of system costs and bandwidth needed to support cybersecurity operations
Adaptive cybersecurity that can adjust security based on mission needs and status

What needs to change is away from a network based cybersecurity approach to a data aware cybersecurity methodology. A data aware cybersecurity approach makes data protection an inherent element of the data creation and management itself, not based on the network infrastructure. Value is gained by increasing security while reducing the mission and financial requirements to share large amounts of data that contains miniscule amounts of relevant information. Additionally, the result is a more resilient, and less fragile architecture to what is currently being fielded.

OK, I’m interested. How does it work?

In order to dramatically improve cybersecurity performance, we need a new cybersecurity model. Here are some elements of that new model:

Build cybersecurity protection capabilities directly into the data. In other words, data becomes self-aware and can only travel approved paths, be viewed by approved users and destroys itself after viewing based on currency and retention needs.
Combine data containerization, based on mission, with micro encryption at the data level in order to enable tighter control at the data source.
Apply a zero-trust attitude based on the reality that no network, no data center, no compute core or chip is assumed to be safe place.
Automate all controls and configuration. Use fit-for-purpose tools, scripts and digital recipes to manage 99966% of all management activity on IT resources (compute, storage, networking) regardless of whether they are hardware, software or services. Make it a firing offense for someone to access an engineering console without permission; then don’t grant permission.
Move anomaly detection capabilities closer to the data source and as independent from the network as possible. Filter and discard as much log data as possible based on the rule that the value of the data is inversely proportional to its predictability. Use local detection capabilities to check log entries against authorized controls and drop all the entries that are benign. Only exchange anomalies and differential log data between systems
Use Artificial Intelligence to:
- characterize, tag, store and deprecate data elements and sources.
- visualize data usage and paths in near real time
- develop common data models
- analyze usage patterns
- detect constantly changing attack patterns
Implement evergreen management of all compute, storage and network devices, such that they routinely perform a complete software rebuild from the firmware up. Attack patterns are constantly changing, so the data aware environment must be designed to likewise constantly change. It’s harder to hit a moving target.

There is ongoing research and some commercial product offerings in each of these areas. What has not been applied is an overall data aware approach and strategy of how each of these areas can be combined for security purposes. Achieving a full data aware environment is a major undertaking and can take years to achieve. A number of changes to the baseline, however, can be achieved incrementally and result in major improvements along the path towards a data aware environment.

Sounds great, but what’s it going to cost me?

Let’s start with your baseline costs (Table 1). The natural tendency in measuring baseline spend is to look at the current operations from a financial perspective. The baseline cost of the current operational structure can be measured by: a) mission impact; b) effort, energy and attention; and c) financial spend.

The mission impact includes:

Time to detection of cyber attack is currently slowed by the requirement to share massive log files, compute anomalies, communicate findings and respond
Other mission areas are impacted by the large use of bandwidth and computing utilized to identify the anomaly and pattern recognition

The effort, energy and attention category is all about people:

Highly skilled data engineers to curate, standardize and integrate very large, inconsistent and often dirty data sets
Scarce, expensive cybersecurity professionals to review and analyze results for patterns
Time and energy of senior executives all across the organization, consumed when breaches and compromises occur

The financial baseline spend includes:

Large, highly expensive log generation, collection and analysis tools
Network costs for the transfer of the large data sets
Storage and compute costs for the large datasets
Common spend across the network, the same investment to protect critical mission data and the seating chart for upcoming social events

Though there is a major savings from a financial perspective in moving to a data aware cyber strategy, the greatest savings is in mission accomplishments and performance. A key element of this is highest spend on highest value assets and a scalable cybersecurity model.

How can I be confident this is a real opportunity for improvement?

One of the most significant disruptions in digital modernization is the value shift to Software-Defined Everything (SDx). It ranges from software-defined businesses (Uber, Airbnb) to software-defined networks to software-defined data centers to software-defined infrastructure, etc. This shift has significantly accelerated the clockspeed of all commercial IT offerings, capabilities and functions – the ability to anticipate and adapt to change.

Meanwhile, software is increasingly componentized into smaller and smaller slices of functionality, a dynamic known as micro-segmentation. Micro-segmentation refers to the increasingly granular control of IT assets – application functionality, data, compute capacity, storage capacity, network capacity – as well as of workload visibility, management, and security controls. The result is the increasingly degree of self-awareness (and mutual-awareness) of each of these kinds of IT assets. This self-awareness is now seen in all classes of IT assets.

SDx and Clockspeed

“SDx” is any physical item or function that can be automated or performed by software. SDx includes networking, , compute, storage, security, data center, perimeter, WAN, management fabrics, sensors, an so on.

Clockspeed is the ability to anticipate and adapt to change.

The application world has rapidly progressed toward self-awareness over the past several years. The combination of agile development, cloud computing and mobile devices forced software applications to be as transient as possible. Applications have to be run on any hardware, interface with an almost limitless number of other applications, have indeterminate user environments and be constantly updated. These requirements are forcing the software application world into micro segmentation that allows for small, self-contained applications to be rapidly plugged together to provide a constant stream of new capabilities. The current state-of-the-art is known as micro-services. Additionally, for security purposes, entire environments can be destroyed and recreated from firmware up in a matter of seconds, creating a shifting target for attackers.

Micro-segmentation and the move to Self-aware Data

Today most data still resides in large, static data stores waiting to be attacked. However, as the application world has moved to micro segmentation, scale-out techniques and parallelization, these advances are driving a parallel shift in data management for a number of niche workloads. Self-aware data is the obvious destination, allowing, or disallowing, connection to various micro services and deprecating itself after use. This is leading to data becoming hyper focused for the intended mission and no other use becomes possible. Similarly, as an application environment can be destroyed and reconstituted in a matter of seconds, so can the data environment. Additionally, many of the micro segmentation software components can be applied directly to the self-aware data objects challenge allowing for seamless movement across environments.

Self-aware data is not decades away. The foundations are there and the commercial sector has already begun developing products around the concept. The greatest achievements toward a self-aware data environment has been the work around Managed Information Objects (MIO). MIOs are self-contained data objects that have extensive description and policy information in a shareable container. Systems can then execute against the MIOs based on policies and processes.

The next level up from the MIO is building actual policy execution directly into the container. Purdue University proposed this concept in a paper describing Active Bundles (AB) ¹. ABs are MIOs with another layer of processing that allows them to control their sharing and lifetime as they move from system to system. Similarly, the Air Force Research Lab (AFRL), working with Purdue, has conducted research work related to MIOs and AB’s.

The fastest move to self-aware data in the commercial sector is Keyavi. Keyavi states their platform is a self-aware data capabilities that can control data movement via a number of factors including geospatial, device, user access and duration, to name a few. A single vendor does not a market make, but does serve as a leading indicator of emerging supply-side capability. Similarly, Intufo has developed an artificial intelligence based self aware data capability that controls movement based on the context of the data.

It’s time for a change

The current network based cybersecurity approach is not working. A read of any news site on any day shows that. A dramatically different approach is needed. DoD’s role, in the national security apparatus, affords it the ideal position to drive this dramatically new approach to cybersecurity. With the backing and support of DoD, the self-aware data approach to cybersecurity can be accelerated dramatically.

As is often the case, cultural change will be the central hurdle on the critical path to a dramatic improvement. Large enterprises, like the U.S. DoD, have long applied an approach that suppresses change – code freezes, promote-to-production black-out periods, etc. This Change Management mindset – more properly termed change prevention mindset – is similar to how many people approach physical fitness, saying “I am not healthy enough to start working out”. The adoption of self-aware data techniques and technologies will require significant shifts in cybersecurity policies, processes and people, but starting is the first step.

The 5 step journey to the Promised Land:

What the technology world has shown us in the past decade is that the path to major change starts with the development of a Minimum Viable Product (MVP) to prove that the approach and concept meets the most basic required functionality. The first step in moving toward a data aware enterprise is to develop this MVP. The MVP should address more than just plugging together technology pieces, but a spectrum of activities including people and policies. Additionally, research into how this change impacts traditional thinking around data should be explored. After the MVP, other areas should be expanded to move toward an enterprise, mission driven data aware enterprise.

Develop MVP (Minimum Viable Product) operational capabilities (see final page).
Stop making the problem worse. Implement the new approach in fresh start programs and projects from now on.
Begin to improve the installed base by implementing the approach in upgrades, migrations and major enhancement programs and projects in flight.
Build momentum, improving the installed base by implementing the approach in minor enhancement projects.
Finally, clean up the legacy systems by implementing the approach in all maintenance activities.

Proposal for a Proof of Concept

As mentioned above, the first step is the development of a MVP. The following proposes the initial steps in prosecuting this MVP proof of concept.

Determine existing products and experimental technologies that can be integrated into a demonstrable capability
Choose one function within one mission area to demonstrate the viability of this capability. The function should be fairly limited and measurable.
Develop MVP capability and operate for a limited time either in a simulated or actual environment and measure both the improvement in security and the UX impact. Objective is dramatic improvement in security with minimal impact on UX
Implement red team attacks on new environment and quantify improvements from baseline environment.
Conduct incremental improvements of the MVP to evaluate technology ramp up potential compared to value of security gained.
Evaluate and report on viability of data aware cybersecurity approach.

So What Would A MVP Look Like?

The actual approach to developing a MVP could vary in a number of ways but the core concepts and technology stacks can be fairly well described. The overall objective is to create data objects that are independent, O/S agnostic, controllable, discoverable and self-aware. Leveraging off the work done by Purdue University and the Air Force Research Lab on their work with Active Bundles, this translates into a recipe of technology that would look similar to the following.

All of the technologies mentioned above exist in various forms today. The challenge is finding the right mix and integration methodologies that would result in an open, interoperable and secure environment. The MVP can start with existing technologies in each of these areas to demonstrate the viability of this approach. Commercial technologies that already have forms and variations of these technologies should be the initial candidates for the MVP development.

What Are The Next Steps?

As mentioned above, the core elements of a self-aware data environment exists today. A community of research staff, integrators and commercial product providers need to come together to develop the MVP with all results openly shared with the DoD community. The logical lead for this type of effort would be a research laboratory. Since commercial products have been developed in this arena, another option is leveraging these platforms as part of the MVP.

A key objective of this paper is to discover what achievements have already been accomplished in the development of self-aware data environments. Readers with information on existing approaches and methodologies, either in part or in whole, are encouraged to provide this information to the authors.

Evolver provides cybersecurity clients with modern protection every day, supporting hundreds of end users across the Federal government, local municipalities, and private businesses. For a proof of concept of this MVP, reach us via our contact form below.

Learn More

DEVELOPING MINIMUM VIABLE PRODUCT (MVP) OPERATIONAL CAPABILITIES

Technology:
1. Fund research lab projects to determine interoperable self-aware data capabilities that can be applied across multiple mission areas.
2. Integrate existing technology components such as micro encryption, data container, tagging methods, etc. into an initial functional element.
3. Develop utilities, automation and controls. Focus on operational capabilities like executable code, test cases, etc.
People:
1. Develop new skills training for cyber professionals and network managers.
2. Bring micro-segmentation experts and data scientists together to leverage their collective knowledge.
3. Implement quantified risk approaches that allows for the value of data, and associated risk, to be properly measured.
Policy:
1. Begin policy development efforts including such items such as data life, location, mission, access levels and security levels and how these items are applied in an operational system.
2. Implement new governance roles (like Data Stewards) and responsibilities.
3. Implement a self-aware data research group to identify places where the concept can be prototyped quickly.
4. Begin discussion on the impact of elimination of data stores and persistent data in areas such as audit, legal and law enforcement.
Process:
1. Reform all configuration management workflows; eliminate manual entries and tuning.
2. Develop processes to determine movement of the limited use, single path and limited life data.
3. Cross walk current network policies with self-aware data concepts to determine which network policies can be modified or eliminated.