Running 3 Realio validators: what 99.9% uptime actually takes

99.97%Uptime / 90d

3Active Validators

~5sAvg Block Sign

Most people who delegate to a validator never think about the machine behind it. They see a name in the staking UI, a commission percentage, and an uptime number. The uptime number is the one that matters most — because a validator that misses blocks doesn’t just earn less, it can get slashed, and slashing hits delegators too. This post is about how Teshy keeps three Realio network validators running without that ever becoming your problem.

The hardware baseline

Cosmos-SDK chains have real infrastructure requirements that get heavier as the network grows. Each validator maintains a full copy of chain state, participates in consensus every block (roughly every 5–6 seconds on Realio), and needs to respond fast enough that it doesn’t fall behind the peer set. Running underpowered hardware is one of the most common reasons validators miss blocks.

Our three nodes each run on dedicated-CPU cloud VPS instances — not shared-core burstable types, which can throttle at exactly the wrong moment. Storage is NVMe SSD for every node: spinning disk IO latency is too unpredictable for consensus participation. Each machine runs a recent Ubuntu LTS, with cosmovisor managing the binary so upgrades happen automatically at the governance-specified block height.

Why dedicated CPU matters

Burstable VPS instances share physical cores across tenants. Under load they CPU-throttle. A Cosmos validator that can’t process a block within the prevote window will miss it. We learned this the hard way in early testing and moved every node to dedicated cores.

Three validator keys, three sets of risks

Running one validator is straightforward. Running three adds a coordination layer that most people underestimate. Each validator has its own private key; if that key signs two conflicting blocks at the same height — a double-sign — the chain slashes it automatically and tombstones it permanently. Tombstoning is unrecoverable. The validator is gone, and every delegator loses 5% of their stake instantly.

The double-sign risk is highest during failovers. If you try to bring up a hot standby without first ensuring the primary is fully offline, both can end up signing the same round. Our approach is to keep each validator’s key on exactly one active machine at a time, with the key never replicated to a standby in an “almost ready” state. Failover means manually transferring the key after confirming the primary is dead — not a hot restart of a waiting replica.

Sentry node architecture

Validators should not expose their consensus port directly to the internet. The Realio network, like all Tendermint-based chains, supports a sentry node pattern: public-facing full nodes that relay traffic to and from the validator, which sits behind them on a private network. Teshy runs two sentry nodes — one in the same datacenter region as the validator, one in a different region for redundancy. The sentries handle peer connections, mempool gossip, and block propagation. The validator only talks to its sentries.

This setup has two practical effects. First, the validator’s IP is never visible to the broader p2p network, making targeted DDoS much harder. Second, if one sentry goes offline, the validator continues signing through the other. The sentries are stateless relative to consensus, so they can be restarted freely without any key risk.

Monitoring: what we actually alert on

Uptime is ultimately a monitoring problem. You can’t respond to something you don’t know is happening. Our monitoring stack is built on open-source tooling, with Prometheus scraping node metrics every 15 seconds and Grafana rendering dashboards that we can check at a glance. But dashboards are for post-hoc analysis; the real work is alerts.

The alerts we’ve tuned to be reliable (not noisy) are:

Missed pre-commits: if a validator misses more than 3 consecutive pre-commit rounds, page immediately. This is the earliest indicator of a consensus problem.
Disk above 80%: Cosmos chains grow continuously. Above 80% is yellow; above 90% is a page.
Memory pressure: sustained RSS above 90% of available RAM. Go-based Cosmos nodes have a garbage collector that behaves poorly under memory pressure.
Peer count below 5: if a sentry drops below 5 connected peers, something is wrong with its network connectivity.
Block height lag: if the node’s latest block height is more than 10 blocks behind what our external block explorer reports, we investigate immediately.

Missed blocks ≠ missed rewards only

On Realio, a validator that misses more than 5% of blocks in a sliding window is automatically jailed. Unjailing requires a governance transaction and a waiting period. During that time, delegators earn nothing until the validator unjails. This is why we page on the third consecutive missed round, not after we’ve already missed fifty.

Keeping disk manageable with cosmos-pruner

One of the less-discussed operational challenges of running a long-lived Cosmos validator is disk growth. A default node configuration retains every historical state change — which accumulates at gigabytes per week on an active chain. Left unchecked, disk fills up and the node crashes. The standard mitigation is pruning configuration: telling the node to keep only the last N states and every 500th state for snapshot purposes.

We use cosmos-pruner, an open-source tool that handles offline pruning of the underlying Badger or IAVL database when the node is stopped for maintenance. Combined with the default pruning strategy set to custom, this keeps our disk footprint stable at roughly 120 GB per node instead of the 600+ GB a default-config archive node would accumulate in the same time.

What delegators should look for in a validator

If you’re evaluating validators to delegate to — on Realio or any Cosmos chain — uptime percentage is the right starting point but not the whole story. A validator can show 100% uptime because they just started last week. Look at the uptime over 90+ days, and look at whether they’ve ever been jailed. Jailing history is on-chain and transparent.

Commission rate matters, but secondary to reliability. A validator charging 5% that misses 2% of blocks will net you less than one charging 8% with 99.9% uptime. Also check the governance participation rate — validators that never vote on proposals aren’t engaged in the network and are more likely to miss upgrade windows, which causes jailing.

Teshy currently runs a 5% commission across all three validators, with 90-day uptime figures published monthly on this blog and live via our Grafana dashboard linked from the validators page. We’ve never been jailed on any of the three nodes. That’s the number we intend to keep.

Delegate to Teshy

RIO, RST, and DSTRX staking is live. Visit the validators page for moniker addresses and delegation instructions, or go straight to compound.teshy.com to set up auto-compounding on top of your delegation.