Two processes. One serves HTTP management endpoints (load models, trigger training, health). The other serves a ZMQ REP socket for the real-time predict loop. Mock both in under 50 lines, hit them from a test client, then swap in the real PLS+Ridge model.
Cold path
:5556- Load models ·
POST /model/load - Start training ·
POST /training/start - Poll progress ·
GET /training/{id} - Health check ·
GET /health
Hot path
:5555- Heartbeat ·
ping / pong - Prediction ·
predict / predict_reply - Model info ·
model_info - Latency · ≤ 20 ms p95 round-trip
Install the stack
Four libraries. One install command.
1pip install fastapi uvicorn[standard] pyzmq msgpack scikit-learn numpy
fastapi— REST framework, decorator-based, auto-generates OpenAPI schemasuvicorn[standard]— ASGI server that runs FastAPIpyzmq— Python ZMQ bindings; we only use REP socketsmsgpack— binary serializer on the ZMQ wire (spec mandates this, not JSON)
Management REST · 25 lines
Minimal FastAPI app covering /health and /model/list.
1# File: management_api.py2# Run with: uvicorn management_api:app --host 0.0.0.0 --port 55563from fastapi import FastAPI45app = FastAPI(title="Jope Inference · Management API")67@app.get("/health")8def health():9 return {10 "status": "ok",11 "model_loaded": True,12 "active_version": "v5",13 "uptime_seconds": 0,14 "server_version": "0.1.0",15 "protocol_version": 1,16 "python_version": "3.11.4",17 }1819@app.get("/model/list")20def list_models():21 return {22 "active": "v5",23 "models": [{"version": "v5", "status": "active",24 "trained_at": "2026-03-15T10:00:00Z"}],25 }
1curl http://localhost:5556/health2curl http://localhost:5556/model/list
Response JSON should match the shapes in the REST reference.
Predict worker · 25 lines
Mock REP socket returning hardcoded concentrations for ping and predict.
1# File: inference_worker.py2# Run with: python inference_worker.py3import time, uuid, zmq, msgpack45ctx = zmq.Context(io_threads=1)6sock = ctx.socket(zmq.REP)7sock.bind("tcp://0.0.0.0:5555")8print("Inference worker ready on :5555")910while True:11 req = msgpack.unpackb(sock.recv(), raw=False)1213 if req["type"] == "ping":14 reply_body = {"server_version": "0.1.0", "protocol_version": 1,15 "model_version": "v5", "uptime_seconds": 0}16 reply_type = "pong"1718 elif req["type"] == "predict":19 # TODO: swap this out with your real PLS+Ridge model20 reply_body = {"concentrations": {"EPA": 5.23, "DHA": 3.19, "DPA": 1.82},21 "confidence": {"EPA": 0.95, "DHA": 0.92, "DPA": 0.88},22 "model_version": "v5", "inference_ms": 8.3}23 reply_type = "predict_reply"2425 sock.send(msgpack.packb({26 "v": 1, "type": reply_type,27 "id": str(uuid.uuid4()),28 "correlation_id": req["id"],29 "ts": time.time(), "body": reply_body, "error": None,30 }, use_bin_type=True))
1# File: test_client.py2import time, uuid, zmq, msgpack34ctx = zmq.Context()5sock = ctx.socket(zmq.REQ)6sock.connect("tcp://localhost:5555")78sock.send(msgpack.packb({9 "v": 1, "type": "predict",10 "id": str(uuid.uuid4()), "ts": time.time(),11 "body": {12 "spectrum": {13 "wavenumbers": [200.0 + i*1.5 for i in range(2048)],14 "intensities": [0.0] * 2048,15 "channel": 1, "integration_ms": 1000, "scan_seq": 1,16 },17 "context": {"batch_id": "DEV-001", "port": "extract-E1", "column_index": 3},18 },19}, use_bin_type=True))2021reply = msgpack.unpackb(sock.recv(), raw=False)22print("Got:", reply["body"])
Expected output: Got: {'concentrations': {'EPA': 5.23, 'DHA': 3.19, 'DPA': 1.82}, ...}
Generate a server stub from the spec
One command, get a FastAPI skeleton with every route and schema pre-wired.
1# One-line server stub from the OpenAPI spec2npx @openapitools/openapi-generator-cli generate \3 -i https://jope-docs.pages.dev/specs/openapi.yaml \4 -g python-fastapi \5 -o ./server-stub
Output under ./server-stub includes models/(Pydantic classes per schema in openapi.yaml) andapis/ (one APIRouter per tag). Open the generated main.py, wire in handlers, done.
Hosted mock for wire-level testing
Same contract as the real server. Useful when either side is still in progress.
The docs site ships a mock Cloudflare Pages Function answering the exact same contract as the production server. Hit it from your C# Console while the Python side is still being built, or from your Python client to sanity-check request shape before wiring up the real inference engine.
GET https://jope-docs.pages.dev/mock-api/healthGET https://jope-docs.pages.dev/mock-api/model/listPOST https://jope-docs.pages.dev/mock-api/model/loadPOST https://jope-docs.pages.dev/mock-api/training/startGET https://jope-docs.pages.dev/mock-api/training/{id}Click Try It on any operation inside the REST reference to hit these endpoints live from your browser.
Implementation notes
Details easy to miss, listed upfront.
- Envelope
vis always1.Client checks version on handshake and refuses mismatched replies. Setv: 1explicitly. - Echo
correlation_idback unchanged.Console matches replies to in-flight requests by this field; drop it and every prediction appears as a timeout. - Pack with
use_bin_type=True.MessagePack has two string modes; C# client expectsbin. Omitting drifts non-ASCII encoding. - Long-lived REP socket, not per-request.ZMQ sockets are designed to be reused. Reconnect-per-message burns the 20ms latency budget on socket setup.
retryable: falsefor deterministic errors.MODEL_NOT_LOADED,INVALID_SPECTRUM,SPECTRUM_OUT_OF_RANGEare not transient — retrying burns cycles.
Go deeper
Full OpenAPI · Scalar
Every endpoint · request / response schemas · Try-It against the mock.
Full AsyncAPI · custom view
Every message · envelope schema · payload examples.
ZMQ protocol narrative
Why two protocols, latency budget breakdown, versioning policy, revision log.