Running a Validator
Methexis supports two validator roles:
Training Validators (compute providers) – run training jobs, co‑sign model updates, and submit them on‑chain.
Data Validators (validation workers) – screen datasets with automated checks and stake‑weighted committee voting.
Both roles earn $MTHX; both can be slashed for misbehavior or chronic downtime.
Hardware & OS Requirements
These are recommended baselines for testnet/mainnet. You can run bigger machines for higher rewards. Values may be tuned via governance.
Training Validator (GPU)
GPU: NVIDIA 24 GB+ VRAM (e.g., RTX 3090/4090, A5000/A6000). Multiple GPUs supported.
CPU: 8+ cores (x86_64).
RAM: 32–64 GB.
Storage: 1 TB NVMe SSD (model checkpoints + cache).
Network: ≥ 100 Mbps up/down, low jitter; public IP or properly forwarded ports.
OS: Ubuntu 22.04 LTS or 24.04 LTS.
Drivers: NVIDIA driver + CUDA/cuDNN matching the container image.
Data Validator (CPU)
CPU: 4+ cores.
RAM: 16 GB.
Storage: 200–500 GB SSD (dataset cache + logs).
Network: ≥ 50 Mbps up/down.
OS: Ubuntu 22.04/24.04 LTS.
Optional: co‑located IPFS node with generous pinning space improves latency for both roles.
Software Prerequisites
Docker or another OCI‑compatible runtime (recommended for reproducibility).
NVIDIA Container Toolkit (for GPU nodes).
Git (for pulling configs).
Systemd (or another process supervisor) for reliability.
NTP/chrony for accurate time sync (prevents consensus issues).
Keys & Wallet Safety
Create a dedicated validator wallet. Keep your treasury/cold funds separate.
Prefer hardware wallet or remote signing.
Never paste seed phrases into terminal sessions on shared machines.
Back up the validator keystore + configs (encrypted) and store off‑site.
Quick Start (Testnet)
Below are example commands with placeholder names. Replace hostnames/images when your repos are public.
1) Pull the validator image
# Training validator (GPU)
docker pull ghcr.io/methexisxyz/mhx-training-validator:latest
# Data validator (CPU)
docker pull ghcr.io/methexisxyz/mhx-data-validator:latest
2) Create a working directory
mkdir -p /opt/methexis/{config,data,logs}
cd /opt/methexis
3) Generate or import validator keys
# Example CLI (placeholder)
docker run --rm -it -v $PWD:/work ghcr.io/methexisxyz/mhx-training-validator:latest \
mhx keys import --keystore /work/config/validator.json
4) Configure the node
Create config.yaml
:
# /opt/methexis/config/config.yaml
role: training # training | data
network: testnet
eth_rpc: "https://rpc.testnet.example.org"
staking_contract: "0xTEST_STAKING"
rewards_contract: "0xTEST_REWARDS"
data_registry_contract: "0xTEST_DATA"
validator_key: "/opt/methexis/config/validator.json"
ipfs:
gateway: "https://ipfs.io"
pin: true
compute:
gpus: "all" # or "0,1"
mixed_precision: true
metrics:
enable: true
listen_addr: "0.0.0.0:9100"
5) Stake test MTHX (Proposed)
# illustrative CLI – will be replaced with final tool
mhx stake --rpc $ETH_RPC --amount 100000 --from <validator-address>
6) Start the service
Docker (recommended)
docker run -d --name mhx-validator \
--gpus all \
-v /opt/methexis/config:/app/config \
-v /opt/methexis/data:/app/data \
-v /opt/methexis/logs:/app/logs \
-p 9100:9100 \
ghcr.io/methexisxyz/mhx-training-validator:latest \
mhx validator start --config /app/config/config.yaml
systemd (optional)
# /etc/systemd/system/mhx-validator.service
[Unit]
Description=Methexis Validator
After=network-online.target
[Service]
Restart=always
RestartSec=5
Environment=CONFIG=/opt/methexis/config/config.yaml
ExecStart=/usr/bin/docker run --rm --gpus all \
-v /opt/methexis/config:/app/config \
-v /opt/methexis/data:/app/data \
-v /opt/methexis/logs:/app/logs \
-p 9100:9100 \
ghcr.io/methexisxyz/mhx-training-validator:latest \
mhx validator start --config ${CONFIG}
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now mhx-validator
Operating Modes
Training Validator
Subscribes to round events, fetches latest approved datasets & checkpoint, executes training, signs the update, and submits the result on‑chain.
Parameters you can tune:
batch_size
,accumulation_steps
,mixed_precision
,gpu_ids
,max_job_runtime
.
Data Validator
Pulls pending datasets, runs license/format/anomaly/duplication checks, participates in stake‑weighted committee votes, and writes results back (Accepted/Rejected).
Parameters you can tune:
max_filesize
,accepted_mime_types
,policy_ruleset
,vote_quorum
.
Monitoring & Metrics
Built‑in Prometheus metrics at
:9100
by default.Recommended dashboard: CPU/GPU usage, VRAM, I/O, latency to IPFS gateway, round participation, success rate.
Logs: JSON to stdout; send to Loki/ELK or
journald
.Health checks:
mhx status
(peer count, last round, signatures submitted).
SLO targets (suggested minimums):
Uptime: ≥ 97% over a rolling 30‑day window.
Participation: ≥ 90% of eligible rounds.
Attestation agreement: ≥ 99% (training results within consensus bounds).
Falling below SLOs risks slashing once parameters are finalized.
Rewards: How You Earn
Let R be the round reward (in MTHX) after any maintenance skim.
Training Validators: receive ~58% × your compute share.
Data Providers: ~35% allocated across accepted datasets.
Validation Committees: ~7% split by honest participation.
Example: If your validator contributes 15% of the verified compute for a round and R = 10,000 MTHX, you earn ≈
0.58 * 0.15 * 10000 = 870 MTHX
.
(Exact splits and formulas are governed; values above are current targets.)
Slashing & Risk Management
You may be slashed for:
Submitting invalid or dishonest computation (training validators).
Colluding or voting against evidence in committee decisions (data validators).
Chronic downtime or missing required attestations.
Double‑signing / equivocating during a round.
Mitigations:
Run sentry architecture (expose a public node; keep the validator behind a firewall).
Use remote signer / hardware wallet; isolate keys from the worker.
Configure auto‑shutdown if time sync or GPU health checks fail.
Keep NTP/chrony active; monitor clock drift.
Maintain reliable power/UPS and redundant network links where possible.
Upgrades
Container images are versioned (
semver
).Always drain before upgrade:
mhx validator drain --grace 300 # stop taking new jobs, finish current one
docker pull ghcr.io/...:vX.Y.Z
docker stop mhx-validator && docker rm mhx-validator
# start again with the new image
Read release notes for any config schema changes.
Troubleshooting
High GPU memory usage: lower
batch_size
or enablemixed_precision
.Slow downloads: switch
ipfs.gateway
to a closer mirror; run your own IPFS node.Frequent timeouts: check ISP packet loss/jitter; tune
max_job_runtime
.Attestation mismatch: compare container versions; purge cache; re‑sync to the latest checkpoint.
RPC failures: fail over to a secondary
eth_rpc
endpoint.
Unstaking & Exit
Unbonding period (Proposed): 14–28 days.
Exit steps:
mhx validator drain
→ wait for round completion →mhx unstake
→ halt the service after the unbonding period.
Security Best Practices (Checklist)
Last updated