Software Composition Analysis (SCA) is dead

Oct 05, 2024

Hey there, fellow code warriors! It's time we had a frank talk about the elephant in the room: Software Composition Analysis (SCA) is dead. Yeah, you read that right. While the big players are busy slapping new paint on old ideas, some of us are wondering if we're the only ones who see through the smoke and mirrors.

The Current State of SCA and SAST

Software Composition Analysis (SCA) and Static Application Security Testing (SAST) have been treated as separate entities for years. However, as our tools evolve, this distinction is becoming increasingly artificial. Let's break it down:

Reachability

Both SCA and SAST aim to perform static analysis, one they call reachability and the other is referred to as a call graph. Here's a technical example of how this might look in practice:

  import ast
import networkx as nx
from typing import Dict, Set, Tuple

class ReachabilityAnalyzer:
    def __init__(self, source_code: str):
        self.ast_tree = ast.parse(source_code)
        self.call_graph = nx.DiGraph()
        self.build_call_graph()

    def build_call_graph(self):
        for node in ast.walk(self.ast_tree):
            if isinstance(node, ast.FunctionDef):
                self.call_graph.add_node(node.name)
                for child_node in ast.walk(node):
                    if isinstance(child_node, ast.Call) and isinstance(child_node.func, ast.Name):
                        self.call_graph.add_edge(node.name, child_node.func.id)

    def analyze_reachability(self, entry_point: str, target_function: str) -> bool:
        return nx.has_path(self.call_graph, entry_point, target_function)

    def get_all_paths(self, entry_point: str, target_function: str) -> List[List[str]]:
        return list(nx.all_simple_paths(self.call_graph, entry_point, target_function))

# Usage
code = """
def main():
    foo()
    bar()

def foo():
    baz()

def bar():
    qux()

def baz():
    pass

def qux():
    baz()
"""

analyzer = ReachabilityAnalyzer(code)
print(f"Is 'baz' reachable from 'main'? {analyzer.analyze_reachability('main', 'baz')}")
print(f"All paths from 'main' to 'baz': {analyzer.get_all_paths('main', 'baz')}")

This example demonstrates how reachability analysis can be performed, regardless of whether we're dealing with first-party or third-party code, using the AST to create a call graph.

In modern development, the distinction between first-party and third-party code is increasingly blurred. Consider this example of a typical application configuration:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_CONFIG: |
    {
      "first_party_components": [
        {"name": "core-business-logic", "version": "1.0.0", "path": "/app/core"},
        {"name": "custom-auth-module", "version": "2.3.1", "path": "/app/auth"}
      ],
      "third_party_dependencies": [
        {"name": "left-pad", "version": "1.3.0", "source": "npm"},
        {"name": "is-odd", "version": "3.0.1", "source": "npm"}
      ],
      "cloud_native_services": [
        {"name": "AWS-Lambda", "version": "latest", "functions": [
          {"name": "data-processor", "runtime": "nodejs14.x", "handler": "index.handler"}
        ]}
      ],
      "container_images": [
        {"name": "app-frontend", "tag": "v2.1.3", "base": "node:14-alpine"},
        {"name": "app-backend", "tag": "v1.9.2", "base": "python:3.9-slim"}
      ]
    }

The boundaries between what's "ours" and what's "theirs" are not always clear-cut, the config is a simplified variation of a real service that manages the deployments in a multi-tenant SaaS product.

What even is "ours" anyway?

Given these overlaps, it might make sense to consider a unified approach to code analysis. Where SAST already marks up it's detections with CVE's, so it may just as easily report SCA findings the same way while using it's call graph to provide reachability in the same way it does it's detections.

This unified approach could potentially offer several benefits:

Efficiency: One tool instead of multiple scanners.
Comprehensive Security: A holistic view of application security, including IaC, secrets, and containers (Semgrep do it well already)
Cost-Effectiveness: Potentially lower costs for tooling, due to lower maintenance needs
Simplified Workflow: Less context-switching between different tools and reports.

The Need for Standardisation

While we're at it, let's talk about standardisation. The adoption of a common format like SARIF (Static Analysis Results Interchange Format) could greatly improve interoperability between tools. Here's an example of what that might look like:

sarif-reachability-example

sarif-reachability-example.json

5 KB

Reachability Information: In both results, the "properties" object includes a "reachability" section:

"reachability": {
  "isReachable": true,
  "entryPoints": ["main"],
  "callChain": ["main", "query_db"]
}

This indicates whether the vulnerability is reachable, from which entry points, and through what call chain.

The "codeFlows" section in the SQL Injection result provides a step-by-step path of how the vulnerability can be reached, showing the progression from user input to the vulnerable SQL query executed by dependency code, SCA would have merely showed the CVE without much context, this is much better.

SARIF provides the dependency details too (like SCA) the vulnerable dependency result includes additional information specific to the dependency:

"dependencyDetails": {
  "name": "outdated-and-vulnerable",
  "version": "1.0.0",
  "ecosystem": "PyPI",
  "vulnerabilityId": "CVE-2023-12345"
}

This SARIF report demonstrates how a unified security analyzer can provide detailed information about both SAST and SCA findings, including reachability analysis, in a standardized format. This allows for comprehensive vulnerability reporting that can be easily consumed and processed by various workflows.

Look, I'm not saying I have all the answers. I'm just a developer trying to make sense of our ever-evolving landscape. But it seems to me that as the lines between first-party and third-party code continue to blur, and as our tools become increasingly complex, we might need to rethink our approach to code analysis.

I've not mentioned DAST, Secrets, IaC, etc. specifically, but these are basically SAST flavours and follow the same primitives described here - there's no reason they can't be interchanged to fit this idea.

The idea of unifying tools might sound radical, but hey, remember when we thought JavaScript was just for making annoying pop-ups? Times change, and maybe our tools should too.

What do you think? Am I onto something here, or have I just had one too many cups of coffee? Let me know your thoughts, and in the meantime, may your code be bug-free and your dependencies up-to-date.

Bits of Cyber

Discussion about this post