SAPL Performance - Throughput, Latency, and Scaling Benchmarks

Performance

SAPL is an attribute-based access control (ABAC) policy engine with streaming authorization support. This page documents SAPL’s performance characteristics across deployment modes (embedded, HTTP, RSocket), runtimes (JVM, native image), and numbers of policies (1 to 10,000 policies). All numbers are from automated, reproducible benchmarks on controlled hardware.

SAPL ships as a single native binary (sapl) and as a JVM application (sapl-node.jar). Both provide the same functionality. Choose based on your deployment model.

JVM Runtime

Running on the JVM with HotSpot C2 JIT unlocks peak throughput for extreme-scale deployments. The JIT compiler optimizes the hot evaluation path at runtime, achieving roughly 2x the throughput of the native binary for sustained workloads.

2M RSocket decisions/sec (8 P-cores, 9905 policies)
35 μs p50 server latency at typical load
15M embedded in-process decisions/sec (single thread)
179-374 ns embedded p50 per-decision latency (RBAC to 9905 policies)

Choose JVM when you need maximum throughput per node, when embedding the PDP directly into a Spring application, or when the application already runs on the JVM.

Native Binary

The native binary is a self-contained executable compiled with GraalVM ahead-of-time. No JVM, no runtime dependencies, no classpath. Install via deb, rpm, or copy the binary. Ideal for sidecar containers, CLI tooling, and environments where a JVM is not available or not desired.

900K RSocket decisions/sec (8 P-cores, 9905 policies)
45 μs p50 server latency at a typical load

For context: at 900K decisions/sec, a single SAPL instance can authorize every request for hundreds of concurrent applications without becoming a bottleneck. The native binary is the recommended deployment for most use cases.

Policy Scaling

1.6x throughput degradation from 38 to 9905 policies in the hospital scenario

The SMTDD index exploits structural overlap in policy sets. When policies share common attribute checks (e.g. role, department, resource type), the index collapses them into multi-way HashMap lookups. This is the common case for real-world deployments where policies are organized around a finite set of attributes.

The near-flat scaling shown here is not guaranteed for all policy structures. Policies with entirely disjoint predicates or complex non-equality conditions will fall back to standard binary decision nodes with less favorable scaling. The AUTO index mode selects the best strategy for the given policy set automatically.

Test Environment

CPU: Intel Core i9-13900KS (8 P-cores + 16 E-cores, 32 logical)
Clock: All P-cores pinned to 4.0 GHz (constant frequency, no turbo/throttle noise)
JVM: OpenJDK 25.0.2 (HotSpot C2)
Native: GraalVM native-image (same JDK base)
OS: NixOS Linux 6.18.19
Pinning: Server on P-cores (taskset), client on E-cores
Thermal: Cool-down between runs to prevent frequency scaling

Quick-profile results (10s measurement, 2 forks). Full-profile numbers with longer measurement windows and convergence checking are being collected.

Benchmark Scenarios

All scenarios use realistic policy structures. The hospital scenarios model a healthcare access control system where policies guard patient records by role (doctor, nurse, admin), resource type (record, lab result), department, and action. The number of departments scales linearly, producing 33N+5 policies for N departments. This provides a controlled policy scaling curve from 38 to 9905 policies. Hospital-300 with 9905 policies is a deliberate stress test; typical production deployments use far fewer policies.

The github, gdrive, and tinytodo scenarios are SAPL equivalents of the Cedar OOPSLA 2024 benchmark suite (Cutler et al., 2024, Section 5, Figure 14). In these scenarios, the scaling factor N refers to the number of entities per type (users, teams, repos, etc.), not the number of policies. The policies are a small fixed set; the entity graph grows with N. Cedar uses templates to encode per-entity permissions; SAPL uses compile-time constant folding to achieve equivalent performance with static policies.

Scenario	Scale	Policies	Description
baseline	-	1	Single unconditional deny policy
rbac	-	1	Role-based access with IN-operator permission lookup
hospital-N	N departments	33N+5	Hospital ABAC: role, resource type, department, and action guards
hospital-1	1 dept	38
hospital-5	5 depts	170
hospital-50	50 depts	1655
hospital-100	100 depts	3305
hospital-300	300 depts	9905
Cedar OOPSLA equivalents	N = entities/type
github-N	N entities/type	8	GitHub repository permissions (Cedar Fig. 14b)
gdrive-N	N entities/type	5	Google Drive file sharing (Cedar Fig. 14a)
tinytodo-N	N entities/type	4	Todo app with team sharing (Cedar Fig. 14c)

Embedded Throughput & Latency

In-process policy evaluation with no network overhead. The PDP is embedded directly in the application JVM. This represents the pure evaluation cost.

JVM - Throughput by Scenario and Threads

Native - Throughput by Scenario and Threads

JVM - Per-Decision Latency, single thread (JMH SampleTime)

Native - Per-Decision Latency, single thread

Index Strategy Comparison (single thread, embedded)

Three indexing strategies: NAIVE (linear scan), CANONICAL (predicate-based), and SMTDD (semantic multi-terminal decision diagram). SMTDD collapses equality predicates into HashMap lookups for near-constant cost regardless of policy count.

Hospital Scaling: Throughput vs Policy Count (JVM)

Hospital Scaling: Throughput vs Policy Count (Native)

Server Throughput

Server deployment over HTTP/JSON and RSocket/protobuf. RSocket provides significantly higher throughput due to binary framing, connection multiplexing, and zero-copy payload handling.

Server Throughput: HTTP vs RSocket, JVM vs Native (best config per scenario)

Server Latency at Load

RSocket latency at controlled load fractions. The load generator sends requests at 1%, 10%, 50%, and 90% of measured saturation throughput and records per-request service time. At typical load (1-10% of capacity), p50 latency is 35-37 μs regardless of policy count.

Hospital-300 (9905 policies) - JVM

Hospital-300 (9905 policies) - Native

GitHub-10 (Cedar equivalent) - JVM

GitHub-10 (Cedar equivalent) - Native

JVM vs Native Image

GraalVM native image provides sub-second startup but loses HotSpot C2 JIT optimizations for sustained throughput. JVM is approximately 2x faster for embedded evaluation across all scenarios.

Embedded Throughput: JVM vs Native (1 thread)

Methodology

Measurement

Embedded: JMH (JVM) and custom timing loops (native) with convergence-based fork count.
Server throughput: wrk2 with constant-rate saturation (HTTP) and reactive load generator (RSocket).
Server latency: Rate-limited reactive load generator (Flux.interval) at percentages of measured saturation. Per-request service time (send-to-response).

CPU Isolation

All P-cores frequency-locked to 4.0 GHz to eliminate turbo boost variance and thermal throttling
Server pinned to P-cores via taskset (1, 4, or 8 P-cores)
Client pinned to E-cores to prevent contention
Thermal cool-down between runs to prevent frequency scaling artifacts

Statistics

95% confidence intervals via t-distribution
Coefficient of variation (CoV) for convergence detection
Latency percentiles: p50, p90, p99, p99.9, max

Reproducing These Results

All benchmark code, scenario generators, runner scripts, and analysis tools are open source in the sapl-policy-engine repository:

sapl-benchmark/ - JMH benchmark harness and scenario generators
sapl-benchmark/scripts/ - Runner scripts for all benchmark types
sapl-benchmark/scripts/lib/bench.py - Statistics, convergence, and data aggregation
sapl-benchmark/scripts/lib/profiles/ - Quality (quick/full) and experiment profiles

./sapl-benchmark/scripts/build.sh
./sapl-benchmark/scripts/run-all.sh ./results