Getting Operator Deployments Right: The Importance of Resource Limits

Dynatrace

Getting Operator Deployments Right: The Importance of Resource Limits

Unstable operators can break your monitoring. Discover how setting resource limits early ensures stability, scalability, and predictable performance.

When deploying Dynatrace Operators in Kubernetes, many teams rush to get the setup running. It works… until it doesn’t. Suddenly, pods start crashing, memory spikes go unexplained, or the operator itself becomes unstable.

The culprit? Missing or misconfigured resource limits.

Operators aren’t just another workload. They manage the deployment and lifecycle of critical components like OneAgents and ActiveGates. If the operators themselves are unstable, the entire monitoring layer is at risk. That’s why defining CPU and memory requests/limits during the very first phase of deployment is not optional—it’s essential.

The Problems Teams Face During Operator Setup

Here are the most common issues I’ve seen in the field:
Unbounded Resource Usage
Without explicit limits, operator pods may consume far more CPU or memory than expected, starving business-critical workloads.
Pod Evictions and Restarts
Low requests cause Kubernetes to evict or restart operators when the cluster is under pressure, leading to monitoring downtime.
Scaling Uncertainty
An unstable operator introduces delays in rolling out agents, handling updates, or processing logs—impacting observability end-to-end.
Right-Sizing Confusion
Teams often don’t know whether to size the operator for small, mid, or large environments. Trial-and-error wastes time and creates instability.

Why Resource Limits Should Come First

Think of operators as the control plane of your Dynatrace monitoring in Kubernetes. If the control plane is shaky, the entire system wobbles. Proper resource limits provide:

Stability from Day One – No more operator pods restarting randomly.
Predictable Scheduling – Kubernetes can reliably allocate CPU and memory.
Best Practice Alignment – Dynatrace provides proven groupings (Small, Mid, Large) that serve as safe defaults.
Future Scalability – Limits act as a stable baseline, making it easier to fine-tune as the environment grows.

Example: Resource Limits in YAML

Here’s a simple YAML snippet to illustrate how limits look in practice:

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

These values should be adjusted based on the environment size (Small / Mid / Large), but having them defined ensures the operator never runs unchecked.

The Resource Limits Tool

To make this process easier, I’ve built a Dynatrace Operator Resource Limits Tool that:

Let's you select the environment size (Small / Mid / Large).
Generates ready-to-use YAML with CPU and memory requests/limits.
Provides a clean GUI with export options for quick deployment.
Includes a built-in README to answer common configuration questions.

=> Check it out on GitHub

Final Thoughts

Resource limits may seem like a minor detail, but they form the foundation of a stable, scalable, and predictable Dynatrace Operator deployment. Skip them, and you risk troubleshooting outages later. Define them upfront, and you’ll save yourself—and your team—countless hours down the road.

Dynatrace, Python

3 min read

Sep 16, 2025

By Abhishek Satpathy

Your email address will not be published. Required fields are marked *

Comment

Name

Website

Save my name, email, and website in this browser for the next time I comment.