Performance
SAPL is an attribute-based access control (ABAC) policy engine with streaming authorization support. This page documents SAPL’s performance characteristics across deployment modes (embedded, HTTP, RSocket), runtimes (JVM, native image), and numbers of policies (1 to 10,000 policies). All numbers are from automated, reproducible benchmarks on controlled hardware.
SAPL ships as a single native binary (sapl) and as a JVM application (sapl-node.jar). Both provide the same functionality. Choose based on your deployment model.
JVM Runtime
Running on the JVM with HotSpot C2 JIT unlocks peak throughput for extreme-scale deployments. The JIT compiler optimizes the hot evaluation path at runtime, achieving roughly 2x the throughput of the native binary for sustained workloads.
- 2M RSocket decisions/sec (8 P-cores, 9905 policies)
- 35 μs p50 server latency at typical load
- 15M embedded in-process decisions/sec (single thread)
- 179-374 ns embedded p50 per-decision latency (RBAC to 9905 policies)
Choose JVM when you need maximum throughput per node, when embedding the PDP directly into a Spring application, or when the application already runs on the JVM.
Native Binary
The native binary is a self-contained executable compiled with GraalVM
ahead-of-time. No JVM, no runtime dependencies, no classpath. Install via
deb, rpm, or copy the binary.
Ideal for sidecar containers, CLI tooling, and environments where a JVM
is not available or not desired.
- 900K RSocket decisions/sec (8 P-cores, 9905 policies)
- 45 μs p50 server latency at a typical load
For context: at 900K decisions/sec, a single SAPL instance can authorize every request for hundreds of concurrent applications without becoming a bottleneck. The native binary is the recommended deployment for most use cases.
Policy Scaling
- 1.6x throughput degradation from 38 to 9905 policies in the hospital scenario
The SMTDD index exploits structural overlap in policy sets. When policies share common attribute checks (e.g. role, department, resource type), the index collapses them into multi-way HashMap lookups. This is the common case for real-world deployments where policies are organized around a finite set of attributes.
The near-flat scaling shown here is not guaranteed for all policy structures. Policies with entirely disjoint predicates or complex non-equality conditions will fall back to standard binary decision nodes with less favorable scaling. The AUTO index mode selects the best strategy for the given policy set automatically.
Test Environment
Clock: All P-cores pinned to 4.0 GHz (constant frequency, no turbo/throttle noise)
JVM: OpenJDK 25.0.2 (HotSpot C2)
Native: GraalVM native-image (same JDK base)
OS: NixOS Linux 6.18.19
Pinning: Server on P-cores (taskset), client on E-cores
Thermal: Cool-down between runs to prevent frequency scaling
Benchmark Scenarios
All scenarios use realistic policy structures. The hospital scenarios model a healthcare access control system where policies guard patient records by role (doctor, nurse, admin), resource type (record, lab result), department, and action. The number of departments scales linearly, producing 33N+5 policies for N departments. This provides a controlled policy scaling curve from 38 to 9905 policies. Hospital-300 with 9905 policies is a deliberate stress test; typical production deployments use far fewer policies.
The github, gdrive, and tinytodo scenarios are SAPL equivalents of the Cedar OOPSLA 2024 benchmark suite (Cutler et al., 2024, Section 5, Figure 14). In these scenarios, the scaling factor N refers to the number of entities per type (users, teams, repos, etc.), not the number of policies. The policies are a small fixed set; the entity graph grows with N. Cedar uses templates to encode per-entity permissions; SAPL uses compile-time constant folding to achieve equivalent performance with static policies.
| Scenario | Scale | Policies | Description |
|---|---|---|---|
| baseline | - | 1 | Single unconditional deny policy |
| rbac | - | 1 | Role-based access with IN-operator permission lookup |
| hospital-N | N departments | 33N+5 | Hospital ABAC: role, resource type, department, and action guards |
| hospital-1 | 1 dept | 38 | |
| hospital-5 | 5 depts | 170 | |
| hospital-50 | 50 depts | 1655 | |
| hospital-100 | 100 depts | 3305 | |
| hospital-300 | 300 depts | 9905 | |
| Cedar OOPSLA equivalents | N = entities/type | ||
| github-N | N entities/type | 8 | GitHub repository permissions (Cedar Fig. 14b) |
| gdrive-N | N entities/type | 5 | Google Drive file sharing (Cedar Fig. 14a) |
| tinytodo-N | N entities/type | 4 | Todo app with team sharing (Cedar Fig. 14c) |
Embedded Throughput & Latency
In-process policy evaluation with no network overhead. The PDP is embedded directly in the application JVM. This represents the pure evaluation cost.
JVM - Throughput by Scenario and Threads
Native - Throughput by Scenario and Threads
JVM - Per-Decision Latency, single thread (JMH SampleTime)
Native - Per-Decision Latency, single thread
Index Strategy Comparison (single thread, embedded)
Three indexing strategies: NAIVE (linear scan), CANONICAL (predicate-based), and SMTDD (semantic multi-terminal decision diagram). SMTDD collapses equality predicates into HashMap lookups for near-constant cost regardless of policy count.
Hospital Scaling: Throughput vs Policy Count (JVM)
Hospital Scaling: Throughput vs Policy Count (Native)
Server Throughput
Server deployment over HTTP/JSON and RSocket/protobuf. RSocket provides significantly higher throughput due to binary framing, connection multiplexing, and zero-copy payload handling.
Server Throughput: HTTP vs RSocket, JVM vs Native (best config per scenario)
Server Latency at Load
RSocket latency at controlled load fractions. The load generator sends requests at 1%, 10%, 50%, and 90% of measured saturation throughput and records per-request service time. At typical load (1-10% of capacity), p50 latency is 35-37 μs regardless of policy count.
Hospital-300 (9905 policies) - JVM
Hospital-300 (9905 policies) - Native
GitHub-10 (Cedar equivalent) - JVM
GitHub-10 (Cedar equivalent) - Native
JVM vs Native Image
GraalVM native image provides sub-second startup but loses HotSpot C2 JIT optimizations for sustained throughput. JVM is approximately 2x faster for embedded evaluation across all scenarios.
Embedded Throughput: JVM vs Native (1 thread)
Methodology
Measurement
- Embedded: JMH (JVM) and custom timing loops (native) with convergence-based fork count.
- Server throughput: wrk2 with constant-rate saturation (HTTP) and reactive load generator (RSocket).
- Server latency: Rate-limited reactive load generator (Flux.interval) at percentages of measured saturation. Per-request service time (send-to-response).
CPU Isolation
- All P-cores frequency-locked to 4.0 GHz to eliminate turbo boost variance and thermal throttling
- Server pinned to P-cores via taskset (1, 4, or 8 P-cores)
- Client pinned to E-cores to prevent contention
- Thermal cool-down between runs to prevent frequency scaling artifacts
Statistics
- 95% confidence intervals via t-distribution
- Coefficient of variation (CoV) for convergence detection
- Latency percentiles: p50, p90, p99, p99.9, max
Reproducing These Results
All benchmark code, scenario generators, runner scripts, and analysis tools are open source in the sapl-policy-engine repository:
sapl-benchmark/- JMH benchmark harness and scenario generatorssapl-benchmark/scripts/- Runner scripts for all benchmark typessapl-benchmark/scripts/lib/bench.py- Statistics, convergence, and data aggregationsapl-benchmark/scripts/lib/profiles/- Quality (quick/full) and experiment profiles
./sapl-benchmark/scripts/build.sh
./sapl-benchmark/scripts/run-all.sh ./results