Running a Validator

Methexis supports two validator roles:

Training Validators (compute providers) – run training jobs, co‑sign model updates, and submit them on‑chain.
Data Validators (validation workers) – screen datasets with automated checks and stake‑weighted committee voting.

Both roles earn $MTHX; both can be slashed for misbehavior or chronic downtime.

Hardware & OS Requirements

These are recommended baselines for testnet/mainnet. You can run bigger machines for higher rewards. Values may be tuned via governance.

Training Validator (GPU)

GPU: NVIDIA 24 GB+ VRAM (e.g., RTX 3090/4090, A5000/A6000). Multiple GPUs supported.
CPU: 8+ cores (x86_64).
RAM: 32–64 GB.
Storage: 1 TB NVMe SSD (model checkpoints + cache).
Network: ≥ 100 Mbps up/down, low jitter; public IP or properly forwarded ports.
OS: Ubuntu 22.04 LTS or 24.04 LTS.
Drivers: NVIDIA driver + CUDA/cuDNN matching the container image.

Data Validator (CPU)

CPU: 4+ cores.
RAM: 16 GB.
Storage: 200–500 GB SSD (dataset cache + logs).
Network: ≥ 50 Mbps up/down.
OS: Ubuntu 22.04/24.04 LTS.

Optional: co‑located IPFS node with generous pinning space improves latency for both roles.

Software Prerequisites

Docker or another OCI‑compatible runtime (recommended for reproducibility).
NVIDIA Container Toolkit (for GPU nodes).
Git (for pulling configs).
Systemd (or another process supervisor) for reliability.
NTP/chrony for accurate time sync (prevents consensus issues).

Keys & Wallet Safety

Create a dedicated validator wallet. Keep your treasury/cold funds separate.
Prefer hardware wallet or remote signing.
Never paste seed phrases into terminal sessions on shared machines.
Back up the validator keystore + configs (encrypted) and store off‑site.

Quick Start (Testnet)

Below are example commands with placeholder names. Replace hostnames/images when your repos are public.

1) Pull the validator image

# Training validator (GPU)
docker pull ghcr.io/methexisxyz/mhx-training-validator:latest

# Data validator (CPU)
docker pull ghcr.io/methexisxyz/mhx-data-validator:latest

2) Create a working directory

mkdir -p /opt/methexis/{config,data,logs}
cd /opt/methexis

3) Generate or import validator keys

# Example CLI (placeholder)
docker run --rm -it -v $PWD:/work ghcr.io/methexisxyz/mhx-training-validator:latest \
  mhx keys import --keystore /work/config/validator.json

4) Configure the node

Create config.yaml:

# /opt/methexis/config/config.yaml
role: training            # training | data
network: testnet
eth_rpc: "https://rpc.testnet.example.org"
staking_contract: "0xTEST_STAKING"
rewards_contract: "0xTEST_REWARDS"
data_registry_contract: "0xTEST_DATA"
validator_key: "/opt/methexis/config/validator.json"
ipfs:
  gateway: "https://ipfs.io"
  pin: true
compute:
  gpus: "all"             # or "0,1"
  mixed_precision: true
metrics:
  enable: true
  listen_addr: "0.0.0.0:9100"

5) Stake test MTHX (Proposed)

# illustrative CLI – will be replaced with final tool
mhx stake --rpc $ETH_RPC --amount 100000 --from <validator-address>

6) Start the service

Docker (recommended)

docker run -d --name mhx-validator \
  --gpus all \
  -v /opt/methexis/config:/app/config \
  -v /opt/methexis/data:/app/data \
  -v /opt/methexis/logs:/app/logs \
  -p 9100:9100 \
  ghcr.io/methexisxyz/mhx-training-validator:latest \
  mhx validator start --config /app/config/config.yaml

systemd (optional)

# /etc/systemd/system/mhx-validator.service
[Unit]
Description=Methexis Validator
After=network-online.target

[Service]
Restart=always
RestartSec=5
Environment=CONFIG=/opt/methexis/config/config.yaml
ExecStart=/usr/bin/docker run --rm --gpus all \
  -v /opt/methexis/config:/app/config \
  -v /opt/methexis/data:/app/data \
  -v /opt/methexis/logs:/app/logs \
  -p 9100:9100 \
  ghcr.io/methexisxyz/mhx-training-validator:latest \
  mhx validator start --config ${CONFIG}

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enable --now mhx-validator

Operating Modes

Training Validator

Subscribes to round events, fetches latest approved datasets & checkpoint, executes training, signs the update, and submits the result on‑chain.
Parameters you can tune: batch_size, accumulation_steps, mixed_precision, gpu_ids, max_job_runtime.

Data Validator

Pulls pending datasets, runs license/format/anomaly/duplication checks, participates in stake‑weighted committee votes, and writes results back (Accepted/Rejected).
Parameters you can tune: max_filesize, accepted_mime_types, policy_ruleset, vote_quorum.

Monitoring & Metrics

Built‑in Prometheus metrics at :9100 by default.
Recommended dashboard: CPU/GPU usage, VRAM, I/O, latency to IPFS gateway, round participation, success rate.
Logs: JSON to stdout; send to Loki/ELK or journald.
Health checks: mhx status (peer count, last round, signatures submitted).

SLO targets (suggested minimums):

Uptime: ≥ 97% over a rolling 30‑day window.
Participation: ≥ 90% of eligible rounds.
Attestation agreement: ≥ 99% (training results within consensus bounds).

Falling below SLOs risks slashing once parameters are finalized.

Rewards: How You Earn

Let R be the round reward (in MTHX) after any maintenance skim.

Training Validators: receive ~58% × your compute share.
Data Providers: ~35% allocated across accepted datasets.
Validation Committees: ~7% split by honest participation.

Example: If your validator contributes 15% of the verified compute for a round and R = 10,000 MTHX, you earn ≈ 0.58 * 0.15 * 10000 = 870 MTHX.

(Exact splits and formulas are governed; values above are current targets.)

Slashing & Risk Management

You may be slashed for:

Submitting invalid or dishonest computation (training validators).
Colluding or voting against evidence in committee decisions (data validators).
Chronic downtime or missing required attestations.
Double‑signing / equivocating during a round.

Mitigations:

Run sentry architecture (expose a public node; keep the validator behind a firewall).
Use remote signer / hardware wallet; isolate keys from the worker.
Configure auto‑shutdown if time sync or GPU health checks fail.
Keep NTP/chrony active; monitor clock drift.
Maintain reliable power/UPS and redundant network links where possible.

Upgrades

Container images are versioned (semver).
Always drain before upgrade:

mhx validator drain --grace 300   # stop taking new jobs, finish current one
docker pull ghcr.io/...:vX.Y.Z
docker stop mhx-validator && docker rm mhx-validator
# start again with the new image

Read release notes for any config schema changes.

Troubleshooting

High GPU memory usage: lower batch_size or enable mixed_precision.
Slow downloads: switch ipfs.gateway to a closer mirror; run your own IPFS node.
Frequent timeouts: check ISP packet loss/jitter; tune max_job_runtime.
Attestation mismatch: compare container versions; purge cache; re‑sync to the latest checkpoint.
RPC failures: fail over to a secondary eth_rpc endpoint.

Unstaking & Exit

Unbonding period (Proposed): 14–28 days.
Exit steps: mhx validator drain → wait for round completion → mhx unstake → halt the service after the unbonding period.

Security Best Practices (Checklist)

Separate hot (validator) and cold (treasury) wallets.
Hardware wallet or remote signer for all on‑chain actions.
Encrypted backups of keystore + config.yaml.
Firewall: allow only required ports; restrict SSH to keys + non‑default port.
Continuous monitoring + alerts (CPU/GPU, disk, participation, errors).
Keep drivers and container images up to date.
Use drain before upgrades or maintenance.

Last updated 2 months ago

Good morning