Skip to main content
Quick Start · Python

Run the server in 30 minutes

Mock first, swap in the real model later. Code samples are copy-paste-ready.

⏱ ~30 minutes end-to-end🐍 Python 3.11+📦 FastAPI · pyzmq · msgpack

Two processes. One serves HTTP management endpoints (load models, trigger training, health). The other serves a ZMQ REP socket for the real-time predict loop. Mock both in under 50 lines, hit them from a test client, then swap in the real PLS+Ridge model.

REST · Management

Cold path

:5556
  • Load models · POST /model/load
  • Start training · POST /training/start
  • Poll progress · GET /training/{id}
  • Health check · GET /health
Stack: FastAPI + uvicorn
ZMQ · Inference

Hot path

:5555
  • Heartbeat · ping / pong
  • Prediction · predict / predict_reply
  • Model info · model_info
  • Latency · ≤ 20 ms p95 round-trip
Stack: pyzmq + msgpack
1

Install the stack

Four libraries. One install command.

1pip install fastapi uvicorn[standard] pyzmq msgpack scikit-learn numpy
  • fastapi — REST framework, decorator-based, auto-generates OpenAPI schemas
  • uvicorn[standard] — ASGI server that runs FastAPI
  • pyzmq — Python ZMQ bindings; we only use REP sockets
  • msgpack — binary serializer on the ZMQ wire (spec mandates this, not JSON)
2

Management REST · 25 lines

Minimal FastAPI app covering /health and /model/list.

1# File: management_api.py
2# Run with: uvicorn management_api:app --host 0.0.0.0 --port 5556
3from fastapi import FastAPI
4
5app = FastAPI(title="Jope Inference · Management API")
6
7@app.get("/health")
8def health():
9 return {
10 "status": "ok",
11 "model_loaded": True,
12 "active_version": "v5",
13 "uptime_seconds": 0,
14 "server_version": "0.1.0",
15 "protocol_version": 1,
16 "python_version": "3.11.4",
17 }
18
19@app.get("/model/list")
20def list_models():
21 return {
22 "active": "v5",
23 "models": [{"version": "v5", "status": "active",
24 "trained_at": "2026-03-15T10:00:00Z"}],
25 }
Verify
1curl http://localhost:5556/health
2curl http://localhost:5556/model/list

Response JSON should match the shapes in the REST reference.

3

Predict worker · 25 lines

Mock REP socket returning hardcoded concentrations for ping and predict.

1# File: inference_worker.py
2# Run with: python inference_worker.py
3import time, uuid, zmq, msgpack
4
5ctx = zmq.Context(io_threads=1)
6sock = ctx.socket(zmq.REP)
7sock.bind("tcp://0.0.0.0:5555")
8print("Inference worker ready on :5555")
9
10while True:
11 req = msgpack.unpackb(sock.recv(), raw=False)
12
13 if req["type"] == "ping":
14 reply_body = {"server_version": "0.1.0", "protocol_version": 1,
15 "model_version": "v5", "uptime_seconds": 0}
16 reply_type = "pong"
17
18 elif req["type"] == "predict":
19 # TODO: swap this out with your real PLS+Ridge model
20 reply_body = {"concentrations": {"EPA": 5.23, "DHA": 3.19, "DPA": 1.82},
21 "confidence": {"EPA": 0.95, "DHA": 0.92, "DPA": 0.88},
22 "model_version": "v5", "inference_ms": 8.3}
23 reply_type = "predict_reply"
24
25 sock.send(msgpack.packb({
26 "v": 1, "type": reply_type,
27 "id": str(uuid.uuid4()),
28 "correlation_id": req["id"],
29 "ts": time.time(), "body": reply_body, "error": None,
30 }, use_bin_type=True))
Verify · run this client in another terminal
1# File: test_client.py
2import time, uuid, zmq, msgpack
3
4ctx = zmq.Context()
5sock = ctx.socket(zmq.REQ)
6sock.connect("tcp://localhost:5555")
7
8sock.send(msgpack.packb({
9 "v": 1, "type": "predict",
10 "id": str(uuid.uuid4()), "ts": time.time(),
11 "body": {
12 "spectrum": {
13 "wavenumbers": [200.0 + i*1.5 for i in range(2048)],
14 "intensities": [0.0] * 2048,
15 "channel": 1, "integration_ms": 1000, "scan_seq": 1,
16 },
17 "context": {"batch_id": "DEV-001", "port": "extract-E1", "column_index": 3},
18 },
19}, use_bin_type=True))
20
21reply = msgpack.unpackb(sock.recv(), raw=False)
22print("Got:", reply["body"])

Expected output: Got: {'concentrations': {'EPA': 5.23, 'DHA': 3.19, 'DPA': 1.82}, ...}

4

Generate a server stub from the spec

One command, get a FastAPI skeleton with every route and schema pre-wired.

1# One-line server stub from the OpenAPI spec
2npx @openapitools/openapi-generator-cli generate \
3 -i https://jope-docs.pages.dev/specs/openapi.yaml \
4 -g python-fastapi \
5 -o ./server-stub

Output under ./server-stub includes models/(Pydantic classes per schema in openapi.yaml) andapis/ (one APIRouter per tag). Open the generated main.py, wire in handlers, done.

5

Hosted mock for wire-level testing

Same contract as the real server. Useful when either side is still in progress.

The docs site ships a mock Cloudflare Pages Function answering the exact same contract as the production server. Hit it from your C# Console while the Python side is still being built, or from your Python client to sanity-check request shape before wiring up the real inference engine.

GET https://jope-docs.pages.dev/mock-api/healthGET https://jope-docs.pages.dev/mock-api/model/listPOST https://jope-docs.pages.dev/mock-api/model/loadPOST https://jope-docs.pages.dev/mock-api/training/startGET https://jope-docs.pages.dev/mock-api/training/{id}

Click Try It on any operation inside the REST reference to hit these endpoints live from your browser.

6

Implementation notes

Details easy to miss, listed upfront.

  1. Envelope v is always 1.Client checks version on handshake and refuses mismatched replies. Set v: 1 explicitly.
  2. Echo correlation_id back unchanged.Console matches replies to in-flight requests by this field; drop it and every prediction appears as a timeout.
  3. Pack with use_bin_type=True.MessagePack has two string modes; C# client expectsbin. Omitting drifts non-ASCII encoding.
  4. Long-lived REP socket, not per-request.ZMQ sockets are designed to be reused. Reconnect-per-message burns the 20ms latency budget on socket setup.
  5. retryable: false for deterministic errors.MODEL_NOT_LOADED, INVALID_SPECTRUM,SPECTRUM_OUT_OF_RANGE are not transient — retrying burns cycles.

Go deeper