Quick Start · Jope Inference Server

⏱ ~30 minutes end-to-end🐍 Python 3.11+📦 FastAPI · pyzmq · msgpack

Two processes. One serves HTTP management endpoints (load models, trigger training, health). The other serves a ZMQ REP socket for the real-time predict loop. Mock both in under 50 lines, hit them from a test client, then swap in the real PLS+Ridge model.

REST · Management

Cold path

:5556

Load models · POST /model/load
Start training · POST /training/start
Poll progress · GET /training/{id}
Health check · GET /health

Stack: FastAPI + uvicorn

ZMQ · Inference

Hot path

:5555

Heartbeat · ping / pong
Prediction · predict / predict_reply
Model info · model_info
Latency · ≤ 20 ms p95 round-trip

Stack: pyzmq + msgpack

Install the stack

Four libraries. One install command.

 1pip install fastapi uvicorn[standard] pyzmq msgpack scikit-learn numpy

fastapi — REST framework, decorator-based, auto-generates OpenAPI schemas
uvicorn[standard] — ASGI server that runs FastAPI
pyzmq — Python ZMQ bindings; we only use REP sockets
msgpack — binary serializer on the ZMQ wire (spec mandates this, not JSON)

Management REST · 25 lines

Minimal FastAPI app covering /health and /model/list.

 1# File: management_api.py
 2# Run with: uvicorn management_api:app --host 0.0.0.0 --port 5556
 3from fastapi import FastAPI
 4
 5app = FastAPI(title="Jope Inference · Management API")
 6
 7@app.get("/health")
 8def health():
 9    return {
10        "status": "ok",
11        "model_loaded": True,
12        "active_version": "v5",
13        "uptime_seconds": 0,
14        "server_version": "0.1.0",
15        "protocol_version": 1,
16        "python_version": "3.11.4",
17    }
18
19@app.get("/model/list")
20def list_models():
21    return {
22        "active": "v5",
23        "models": [{"version": "v5", "status": "active",
24                    "trained_at": "2026-03-15T10:00:00Z"}],
25    }

Verify

 1curl http://localhost:5556/health
 2curl http://localhost:5556/model/list

Response JSON should match the shapes in the REST reference.

Predict worker · 25 lines

Mock REP socket returning hardcoded concentrations for ping and predict.

 1# File: inference_worker.py
 2# Run with: python inference_worker.py
 3import time, uuid, zmq, msgpack
 4
 5ctx  = zmq.Context(io_threads=1)
 6sock = ctx.socket(zmq.REP)
 7sock.bind("tcp://0.0.0.0:5555")
 8print("Inference worker ready on :5555")
 9
10while True:
11    req = msgpack.unpackb(sock.recv(), raw=False)
12
13    if req["type"] == "ping":
14        reply_body  = {"server_version": "0.1.0", "protocol_version": 1,
15                       "model_version": "v5", "uptime_seconds": 0}
16        reply_type  = "pong"
17
18    elif req["type"] == "predict":
19        # TODO: swap this out with your real PLS+Ridge model
20        reply_body  = {"concentrations": {"EPA": 5.23, "DHA": 3.19, "DPA": 1.82},
21                       "confidence":     {"EPA": 0.95, "DHA": 0.92, "DPA": 0.88},
22                       "model_version": "v5", "inference_ms": 8.3}
23        reply_type  = "predict_reply"
24
25    sock.send(msgpack.packb({
26        "v": 1, "type": reply_type,
27        "id": str(uuid.uuid4()),
28        "correlation_id": req["id"],
29        "ts": time.time(), "body": reply_body, "error": None,
30    }, use_bin_type=True))

Verify · run this client in another terminal

 1# File: test_client.py
 2import time, uuid, zmq, msgpack
 3
 4ctx  = zmq.Context()
 5sock = ctx.socket(zmq.REQ)
 6sock.connect("tcp://localhost:5555")
 7
 8sock.send(msgpack.packb({
 9    "v": 1, "type": "predict",
10    "id": str(uuid.uuid4()), "ts": time.time(),
11    "body": {
12        "spectrum": {
13            "wavenumbers":  [200.0 + i*1.5 for i in range(2048)],
14            "intensities":  [0.0] * 2048,
15            "channel": 1, "integration_ms": 1000, "scan_seq": 1,
16        },
17        "context": {"batch_id": "DEV-001", "port": "extract-E1", "column_index": 3},
18    },
19}, use_bin_type=True))
20
21reply = msgpack.unpackb(sock.recv(), raw=False)
22print("Got:", reply["body"])

Expected output: Got: {'concentrations': {'EPA': 5.23, 'DHA': 3.19, 'DPA': 1.82}, ...}

Generate a server stub from the spec

One command, get a FastAPI skeleton with every route and schema pre-wired.

 1# One-line server stub from the OpenAPI spec
 2npx @openapitools/openapi-generator-cli generate \
 3  -i https://jope-docs.pages.dev/specs/openapi.yaml \
 4  -g python-fastapi \
 5  -o ./server-stub

Output under ./server-stub includes models/(Pydantic classes per schema in openapi.yaml) andapis/ (one APIRouter per tag). Open the generated main.py, wire in handlers, done.

Hosted mock for wire-level testing

Same contract as the real server. Useful when either side is still in progress.

The docs site ships a mock Cloudflare Pages Function answering the exact same contract as the production server. Hit it from your C# Console while the Python side is still being built, or from your Python client to sanity-check request shape before wiring up the real inference engine.

GET https://jope-docs.pages.dev/mock-api/healthGET https://jope-docs.pages.dev/mock-api/model/listPOST https://jope-docs.pages.dev/mock-api/model/loadPOST https://jope-docs.pages.dev/mock-api/training/startGET https://jope-docs.pages.dev/mock-api/training/{id}

Click Try It on any operation inside the REST reference to hit these endpoints live from your browser.

Implementation notes

Details easy to miss, listed upfront.

Envelope v is always 1.Client checks version on handshake and refuses mismatched replies. Set v: 1 explicitly.
Echo correlation_id back unchanged.Console matches replies to in-flight requests by this field; drop it and every prediction appears as a timeout.
Pack with use_bin_type=True.MessagePack has two string modes; C# client expectsbin. Omitting drifts non-ASCII encoding.
Long-lived REP socket, not per-request.ZMQ sockets are designed to be reused. Reconnect-per-message burns the 20ms latency budget on socket setup.
retryable: false for deterministic errors.MODEL_NOT_LOADED, INVALID_SPECTRUM,SPECTRUM_OUT_OF_RANGE are not transient — retrying burns cycles.

Go deeper

REST Reference

Run the server in 30 minutes