If you manage Kubernetes clusters, CI/CD pipelines, or microservices at scale, you know how tricky governance can get. Open Policy Agent offers a unified way to enforce rules consistently across your systems.
OPA is a CNCF-graduated, general-purpose policy engine that provides a unified way to express and enforce policies as code across an entire technology stack. Many organizations first encounter OPA through Kubernetes, but its design is intentionally broader. It can govern decisions in microservices, CI and CD pipelines, infrastructure-as-code workflows, API gateways, and custom applications.
OPA was created to address a long standing problem in distributed systems. Every platform, service, and tool had its own method for defining and enforcing rules. That fragmentation made governance difficult to scale. OPA introduces a consistent, declarative model that allows teams to centralize policy logic while keeping enforcement local to each system. The result is a more predictable and auditable approach to operational and security controls.
OPA evaluates structured data, typically represented as JSON. This input model allows OPA to reason about configuration files, API requests, infrastructure plans, and other machine-readable inputs in a consistent way across different systems.
OPA separates policy decision making from policy enforcement. This separation is the foundation of its flexibility.
A system such as Kubernetes, a CI pipeline, or a microservice sends OPA a structured request that describes the action it wants to take. The question is always the same: is this allowed. OPA evaluates the request using policies written in Rego, its purpose-built declarative language. OPA then returns a decision. The system that made the request is responsible for enforcing that decision.
In common authorization terminology, OPA acts as the Policy Decision Point while the calling system acts as the Policy Enforcement Point. This design allows OPA to focus entirely on evaluating policies while leaving enforcement to the systems that already control access or execution.
OPA can return more than a simple allow or deny. Policies can produce arbitrary structured JSON as decision output. That output can include metadata, warnings, risk scores, required approval lists, remediation suggestions, or other guidance. Returning structured decision data enables richer automation in pipelines and services and allows calling systems to implement graded responses such as soft failures, advisory messages, or automated remediation steps based on the decision payload.
Below is a minimal Rego policy that demonstrates returning structured decision data suitable for observe mode. Place this example where the article explains structured decision outputs.
package example.image_policy
# Default allow
default allow = true
# High risk when image comes from untrusted repo
high_risk {
input.request.kind.kind == "Deployment"
some i
container := input.request.object.spec.template.spec.containers[i]
startswith(container.image, "untrusted/")
}
# Structured decision output
decision = {
"allow": allow,
"warnings": warnings,
"risk": risk,
"matched_rules": matched_rules
}
warnings = [msg | msg := warning_message]
warning_message = sprintf("Untrusted image found: %v", [container.image]) {
high_risk
}
risk = 90 { high_risk }
risk = 10 { not high_risk }
matched_rules = ["example.image_policy/high_risk"] {
high_risk
}
Add unit tests next to the policy and run opa testi in CI. A simple test file might look like this.
package example.image_policy_test
import data.example.image_policy
test_allow_trusted_image {
input := {
"request": {
"kind": {"kind": "Deployment"},
"object": {
"spec": {
"template": {
"spec": {
"containers": [
{"image": "myregistry.example.com/myapp:1.0"}
]
}
}
}
}
}
}
not image_policy.high_risk with input as input
}
test_detect_untrusted_image {
input := {
"request": {
"kind": {"kind": "Deployment"},
"object": {
"spec": {
"template": {
"spec": {
"containers": [
{"image": "untrusted/myapp:latest"}
]
}
}
}
}
}
}
image_policy.high_risk with input as input
}
Run tests locally with:
opa test .
OPA was originally developed by Styra. The company was founded by engineers with deep experience in virtualization and cloud infrastructure who recognized the need for a unified policy layer in modern distributed systems. OPA entered the Cloud Native Computing Foundation in 2018 and graduated in 2021. Graduation indicates strong community adoption, production-grade maturity, and broad ecosystem integration.
Today OPA is used by organizations such as Netflix, Capital One, Pinterest, and Atlassian. These companies rely on OPA to enforce security controls, validate infrastructure changes, and govern complex multi-tenant environments. OPA's growth reflects a broader industry shift toward policy-as-code and automated governance. Adoption accelerated alongside the rise of Kubernetes and cloud-native architectures, where organizations needed consistent governance across increasingly distributed infrastructure and services.
OPA is intentionally general-purpose. Its design allows it to operate anywhere a system needs to make an authorization or compliance decision.
In Kubernetes admission control, OPA is widely used with Gatekeeper, a Kubernetes admission controller that evaluates incoming resource definitions. Gatekeeper translates Kubernetes admission requests into structured input that OPA evaluates using Rego policies. Gatekeeper implements Constraint Templates that compile to Rego modules and provide a Kubernetes-native way to author policies while still leveraging Rego for evaluation. Teams use this pattern to enforce allowed container registries, required labels, and restrictions on privilege escalation so that workloads entering a cluster meet organizational standards.
Below is a ConstraintTemplate and a Constraint that demonstrate a safe rollout pattern using enforcementAction: warn. Place these examples in the Kubernetes admission control section.
# ConstraintTemplate
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8sallowedrepos
spec:
crd:
spec:
names:
kind: K8sAllowedRepos
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sallowedrepos
violation[{"msg": msg}] {
input.review.object.kind == "Deployment"
some i
img := input.review.object.spec.template.spec.containers[i].image
not startswith(img, "myregistry.example.com/")
msg := sprintf("container image %v is not allowed", [img])
}
# Constraint using warn mode
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
name: disallow-untrusted-images
spec:
enforcementAction: warn
match:
kinds:
- apiGroups: [""]
kinds: ["Deployment"]
A small test Deployment that uses an untrusted image helps verify warn mode behavior.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-untrusted
spec:
replicas: 1
selector:
matchLabels:
app: test-untrusted
template:
metadata:
labels:
app: test-untrusted
spec:
containers:
- name: app
image: untrusted/myapp:latest
Apply the ConstraintTemplate and Constraint, then apply the test Deployment. In warn mode the admission request will not be denied but Gatekeeper will create a violation object or an audit entry.
In CI and CD governance, pipelines query OPA to determine whether a deployment is allowed. OPA can evaluate whether a configuration violates security rules or whether a change meets compliance requirements. Integrations commonly use Conftest, the OPA CLI, or custom CI steps to evaluate infrastructure plans, container metadata, vulnerability scan results, and artifact provenance. Because OPA can return structured decision data, pipelines can use that output to guide automated behavior, for example by requiring manual approval when OPA returns a high risk score or by triggering automated remediation when OPA returns suggested fixes. This shifts governance earlier in the delivery process and reduces the need for manual approvals.
For infrastructure as code validation, OPA can validate Terraform plans, Kubernetes manifests, and CloudFormation templates before they are applied. Evaluating the planned infrastructure state rather than raw configuration files allows policies to reason about the actual resources that will be created after variables and modules are resolved. This reduces false positives and provides a more accurate assessment of compliance. For large organizations, OPA supports policy distribution mechanisms such as bundles to package policies and data for distribution to OPA instances, enabling consistent policy rollout at scale.
In API authorization, OPA can act as a centralized decision engine for microservices. Services send authorization requests to OPA, which evaluates them using Rego policies. OPA can run as a sidecar, a centralized service, or an embedded library depending on architecture and latency requirements. Embedding OPA or running it as a local sidecar reduces network hops for low latency use cases, while a centralized OPA service with caching and bundles can provide consistent decisions across many services.
One of my main concerns about introducing something like OPA into an existing environment is breaking functional services. It is possible and common to introduce OPA in observe mode rather than enforcement mode. Observe mode means treating OPA decisions as advisory rather than mandatory. OPA itself only returns structured decision data. The calling system decides whether to enforce, warn, or ignore those decisions. Running policies in observe mode lets teams collect real-world signals, tune rules, and build confidence without disrupting existing workflows.
A pragmatic rollout begins with audit collection. Run policy evaluations without blocking changes and collect violations centrally to track trends over time. Use advisory outputs that include severity, warnings, and remediation guidance so that callers can log and surface these payloads to owners. Move next to a warn stage where violations are visible and actionable but do not prevent operations. Finally, adopt scoped enforcement where rules are enforced only for specific namespaces, teams, or nonproduction environments while the rest of the fleet remains in observe mode. Progress rules from audit to warn to deny in measured steps based on metrics and stakeholder readiness.
Operationally, baseline collection and triage are essential. Identify the systems and pipelines that will call OPA and run policies in audit mode for a defined period to collect violations and measure frequency. Classify violations by severity and owner, fix common infrastructure issues, and add allowlists where appropriate. Define risk thresholds that determine when a violation should escalate from advisory to warn or deny. Use staged bundle rollout or a GitOps workflow to distribute policy changes and avoid configuration drift. Decision logging is critical; capture input, output, and evaluation traces to support debugging and audits. Expect false positives and plan to reduce noise quickly through allowlists and data-driven rules.
Below is a simple bundle layout and a minimal static server to serve bundles during testing. Place this example where the article discusses policy distribution.
policy-bundle/
├─ policies/
│ ├─ example.rego
├─ data/
│ ├─ allowlists.json
├─ manifest
Manifest example content:
{
"revision": "2026-03-01T12:00:00Z"
}
Serve the bundle directory with a simple Python HTTP server for testing:
# From inside policy-bundle directory
python -m http.server 8080
Configure OPA to fetch bundles from http://bundle-server:8080 in OPA's config file. Use staged bundle rollout to reduce risk when updating policies in production.
Below is a Python client example that queries OPA, logs the structured decision, and treats it as advisory. Place this snippet in the observe mode section to show how callers can implement graded responses.
import requests
import logging
import sys
OPA_URL = "http://localhost:8181/v1/data/example/image_policy/decision"
logging.basicConfig(level=logging.INFO)
def evaluate(input_payload):
resp = requests.post(OPA_URL, json={"input": input_payload}, timeout=5)
resp.raise_for_status()
return resp.json().get("result", {})
def handle_decision(decision):
allow = decision.get("allow", True)
risk = decision.get("risk", 0)
warnings = decision.get("warnings", [])
logging.info("OPA decision allow=%s risk=%s warnings=%s", allow, risk, warnings)
# Observe mode: do not block. Log and escalate based on risk threshold.
if risk >= 70:
logging.warning("High risk detected. Create ticket or require manual approval.")
else:
logging.info("Advisory only. Proceeding with operation.")
if __name__ == "__main__":
# Example input shaped like a Kubernetes admission request
sample_input = {
"request": {
"kind": {"kind": "Deployment"},
"object": {
"spec": {
"template": {
"spec": {
"containers": [
{"image": "untrusted/myapp:latest"}
]
}
}
}
}
}
}
decision = evaluate(sample_input)
handle_decision(decision)
OPA plays a foundational role in autonomous pipelines. Autonomous pipelines rely on automated decision making to evaluate risk, enforce compliance, and block unsafe changes. OPA provides the policy engine that makes these decisions predictable and explainable.
Because OPA policies are written as code, they can evolve alongside the systems they govern. Pipelines can adapt rules without modifying application logic. OPA also provides clear decision outputs that can be logged, audited, and analyzed. This transparency is essential for organizations that need to demonstrate compliance or understand why a pipeline made a particular decision.
OPA's ability to evaluate structured data makes it well suited for complex governance scenarios. It can analyze deployment metadata, security scan results, infrastructure plans, or runtime telemetry. This flexibility allows autonomous pipelines to make informed decisions based on a wide range of signals. Structured decision outputs enable graded automation, such as identifying required approvals, assigning risk levels, or suggesting remediation steps. Including policy trace or matched rule identifiers in the decision payload improves explainability and speeds triage.
OPA is not only a policy engine. It is a mechanism for building trust in automated systems. It ensures that autonomy does not come at the cost of safety or accountability.
Policy distribution and lifecycle management matter in production. OPA supports bundles for distributing policies and data to many OPA instances. Bundles can be hosted on object stores or served via HTTP. Using bundles reduces configuration drift and enables staged rollouts of policy changes.
Testing and observability are critical. Rego policies should be unit tested and integrated into CI pipelines. OPA provides decision logging that records input, output, and evaluation traces. These logs are useful for debugging, audits, and for feeding analytics systems that measure policy effectiveness.
Below is a decision logging configuration snippet for OPA. Place this in the testing and observability section to show how to capture input, output, and trace.
services:
- name: bundle-server
url: http://bundle-server:8080
decision_logs:
console:
enabled: true
include_trace: true
# For production, configure a remote logging service instead of console
Performance and latency tradeoffs influence deployment patterns. For high throughput or low latency use cases, run OPA as a local sidecar or embed it as a library. For centralized governance and easier policy management, run OPA as a shared service with caching and bundles. Hybrid approaches are common.
Authoring patterns matter. Keep policies modular and composable. Use data documents to separate policy logic from policy data. Prefer small, focused rules that are easy to test and reason about. Use policy templates or libraries for common checks such as image registry validation, label enforcement, or network policy constraints.
Treat policy rollout as a product. Assign owners, define SLAs for remediation, and publish regular reports. Track evaluation counts, false positive rate, mean time to remediate, and business impact. Use the observe phase to build stakeholder trust and to create a clear path from advisory to enforcement that aligns with organizational risk tolerance.
Below are concise CI examples to integrate policy checks without failing builds during the audit phase. Place these in the CI and CD governance section.
Run Conftest and write a JSON report without failing the job.
# CI step: evaluate manifest and write JSON report
conftest test deployment.yaml --output json > policy-report.json || true
A Python wrapper can parse the Conftest report and fail only on high severity. Place the wrapper in ci/policy_wrapper.py.
import json
import sys
THRESHOLD = 70
with open("policy-report.json") as f:
report = json.load(f)
# Conftest JSON format varies. This is a conceptual example.
violations = report.get("results", [])
max_risk = 0
for v in violations:
# assume each violation includes a risk field in metadata
risk = v.get("metadata", {}).get("risk", 10)
max_risk = max(max_risk, risk)
print(f"Max risk found: {max_risk}")
if max_risk >= THRESHOLD:
print("Failing build due to high risk policy violation.")
sys.exit(1)
else:
print("Advisory only. Build will not fail.")
sys.exit(0)
A minimal GitHub Actions workflow that runs Conftest and the Python wrapper can be added to the repository. Place this under .github/workflows/policy-check.yml.
name: Policy Check
on:
pull_request:
paths:
- 'deploy/**'
- '.github/workflows/**'
jobs:
policy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Conftest
run: |
curl -sSL -o /tmp/conftest.tar.gz https://github.com/open-policy-agent/conftest/releases/latest/download/conftest_$(uname -s)_$(uname -m).tar.gz
sudo tar -C /usr/local/bin -xzf /tmp/conftest.tar.gz
- name: Run Conftest
run: |
conftest test deploy/deployment.yaml --output json > policy-report.json || true
- name: Run policy wrapper
run: |
python3 ci/policy_wrapper.py
Open Policy Agent provides a flexible, consistent, and auditable way to implement policy as code across diverse systems. Its separation of decision making and enforcement, expressive Rego language, and support for structured decision outputs make it a strong fit for modern governance needs. Introducing OPA in observe mode is a pragmatic and low risk path to adoption. With proper testing, logging, staged rollouts, and governance, OPA can become the policy backbone of autonomous pipelines and platform engineering efforts.