OKTO Server Management¶
Full-featured remote management of edge devices from the factory server. This document describes the command protocol, WebSocket channel, REST API, RBAC model, firmware workflow, and security model.
1. Architecture¶
┌────────────────┐ ┌──────────────────────────────────────────┐
│ OKTO Cloud │ │ Management UI (React Admin) │
│ (app.okto.ru) │ │ │
└───────▲────────┘ └────────────────┬─────────────────────────┘
│ POST /companies/{id}/bottles │ REST (JWT) + WS(/dashboard)
│ POST /batches POST /pallets │
│ ▲ │
│ │ CloudSyncService ▼
┌───────┴──┴───────────────────────────────────────────────┐
│ Factory Server (Ktor + Postgres) │
│ • FactoryCloudClient (bearer authToken) │
│ • EdgeSyncService (persists + enqueues) │
│ • DeviceConnectionRegistry │
│ • CommandDispatchService FirmwareService │
│ • AuthService + JwtService │
└───────▲──────────────────────┬───────────────────────────┘
│ POST /api/v1/sync │ WS /ws/device
│ (VIA_LOCAL_SERVER) │ (commands + events)
┌───────┴──────────────────────┴─────────────────┐
│ Edge Service (Kotlin/Ktor + SQLite) │
│ • ServerConnectionService (persistent WS) │
│ • CommandHandlerService │
│ • OfflineQueueService (forwards to factory │
│ OR direct to OKTO Cloud in DIRECT_CLOUD) │
└────────────────────────────────────────────────┘
Two end-to-end paths — factory-mode and direct-cloud — coexist and can be
switched per device via connection-mode:
VIA_LOCAL_SERVER(default): edge → factory → OKTO cloud. Data is persisted into theaggregated_*tables on the factory server, then enqueued oncloud_sync_queue, then pushed byCloudSyncServiceusing the real OKTO cloud endpoints (/companies/{id}/bottles,/batches,/pallets,/batches/fixate).FactoryCloudClientwraps those calls.-
DIRECT_CLOUD: edge → OKTO cloud (existingOktoCloudClient). Factory-server is optional in this mode and is used only for device management (commands, firmware). -
Each edge device opens a single persistent WebSocket to
wss://<factory>/ws/device?token=<deviceJwt>and re-establishes it with exponential backoff on failure (1s → 30s cap). - The server pushes device commands down the socket;
the device responds with
CommandResult, optionalCommandProgress, and arbitrary telemetry (StatusEvent,LogLineEvent,scan/printevents, alerts). - The dashboard subscribes to
/ws/dashboardto receive the same telemetry stream, filtered by device or event type.
2. Authentication¶
Two flavours of JWT, both signed with HS256 (auth.jwtSecret):
| Token | Issued by | Subject | Scope claim |
|---|---|---|---|
| User JWT | POST /api/v1/auth/login |
<userId> |
user |
| Device JWT | POST /api/v1/devices/{id}/token (with X-Enrollment-Key) |
<deviceId> |
device |
- User JWTs carry a
roleclaim (ADMIN,MANAGER,OPERATOR,VIEWER). /ws/dashboardaccepts only user JWTs./ws/deviceaccepts only device JWTs.
Device enrollment flow¶
The edge service bootstraps itself on first boot by calling:
POST /api/v1/devices/{deviceId}/token
X-Enrollment-Key: <shared-secret>
?name=<name>&companyId=<company>&productionLineId=<line>&version=<version>
- If the device exists, a fresh device JWT is returned.
- If the device is unknown and
auth.allowAutoEnrollment = true, the server auto-registers it (OFFLINE status) and returns a token. - If the enrollment key is missing/wrong, the request is rejected.
Configure both sides symmetrically:
- Factory:
auth.deviceEnrollmentKey - Edge:
factoryServer.enrollmentKey
Default admin credentials on a fresh database:
Change the password immediately and remove the seeded accounts in production.
3. Device commands¶
All commands extend DeviceCommand (see
common/api/DeviceControl.kt):
| Command | Purpose | Dangerous |
|---|---|---|
force_sync |
Immediately process the offline queue | No |
clear_queue |
Delete pending (+ optionally completed) queue rows | Yes |
pull_logs |
Stream the last N log lines back as LogLineEvents |
No |
restart_service |
Gracefully exit edge-service (supervisor restarts it) | Medium |
reboot_os |
systemctl reboot |
Yes |
shutdown_os |
systemctl poweroff |
Yes |
push_config |
Merge a JSON patch into device config | No |
update_firmware |
Download + sha256-verify + stage a new edge-service binary | Medium |
exec_shell |
Run an allow-listed shell template (see ShellTemplates) |
Medium |
enable_device |
Resume production on the device | No |
disable_device |
Pause production (requires a reason) |
Yes |
Dispatch request¶
POST /api/v1/devices/{id}/commands
Authorization: Bearer <userJwt>
Content-Type: application/json
{
"command": { "type": "force_sync", "id": "cmd-uuid" },
"timeoutMs": 15000
}
Response (200):
{
"success": true,
"data": {
"commandId": "cmd-uuid",
"success": true,
"output": "Force-sync completed. In-progress: 7"
}
}
If the device is offline, the server returns success=false and persists the
command with status FAILED.
Bulk dispatch¶
Body is identical to the single-device endpoint. The server fans out a fresh
command-id per target and returns Map<deviceId, CommandResult>.
Command history¶
GET /api/v1/devices/{id}/commands?limit=50&offset=0GET /api/v1/devices/{id}/commands/{cmdId}— single record
4. Firmware workflow¶
-
Upload
Server stores the artifact inPOST /api/v1/firmware/releases?version=1.2.3&channel=stable&filename=edge-service.jar Authorization: Bearer <userJwt> Content-Type: application/octet-stream <binary artifact>data/firmware/<safeVersion>-<filename>, computes SHA-256, and persists aFirmwareReleaserow. -
Deploy
Server creates aFirmwareDeploymentper device (status PENDING → IN_PROGRESS → SUCCESS/FAILED) and dispatches anUpdateFirmwareCmdwith the artifact URL andsha256. -
Device behaviour
UpdateFirmwareExecutordownloads the artifact, verifies sha256, and stages it at<okto.firmware.staging.dir>/edge-service-<version>.jar.-
The supervisor (systemd or Docker) is expected to pick up the staged JAR during its next restart.
-
Browse
GET /api/v1/firmware/releasesGET /api/v1/firmware/releases/{id}/artifact(binary download, used by the device during deployment)
5. Device groups¶
Groups enable bulk operations:
POST /api/v1/device-groups — create
GET /api/v1/device-groups — list
PUT /api/v1/device-groups/{id} — update
DELETE /api/v1/device-groups/{id} — delete
POST /api/v1/device-groups/{id}/members — add devices
DELETE /api/v1/device-groups/{id}/members — remove devices
POST /api/v1/device-groups/{id}/commands — bulk dispatch
6. Audit log¶
Every privileged REST call is recorded in audit_log. Query via:
7. RBAC¶
Roles and default permissions:
| Role | Can do |
|---|---|
| ADMIN | Everything, including user/terminal CRUD, firmware upload, OS shutdown |
| MANAGER | Device config + safe commands (force_sync, pull_logs, clear_queue); user listing |
| OPERATOR | Dispatch force_sync, pull_logs, enable_device |
| VIEWER | Read-only — list devices, view commands, view audit log |
The Ktor JWT plugin enforces authentication; role-based gatekeeping is applied
in individual route handlers (see AuthRoutes.kt and ServerManagementRoutes.kt).
8. Security model¶
- Device JWTs default to a 1-year expiry. Rotate by calling
POST /api/v1/devices/{id}/tokenand re-provisioning the edge service. - User JWTs default to 24 hours (
auth.tokenExpirationMs). exec_shellis strictly allow-listed — seeShellTemplatesin CommandHandlerService.kt. Arbitrary shell is NOT accepted.- Firmware artifacts must pass SHA-256 verification before being staged.
An optional
signatureBase64field is reserved for Ed25519 supply-chain signatures — hook it up in production with a keyring bundled in the edge service JAR. - Dangerous commands (
reboot_os,shutdown_os,disable_device,clear_queue) surface a confirmation dialog in the dashboard UI and are recorded inaudit_logbefore dispatch.
9. Dashboard WebSocket protocol¶
Connect:
Send a subscription filter:
Receive typed events:
{ "type": "status", "deviceId": "edge-1", "status": "ONLINE", "ts": "...", "metrics": { ... } }
{ "type": "scan", "deviceId": "edge-1", "code": "010...", "valid": true }
{ "type": "log_line", "deviceId": "edge-1", "line": "...", "level": "ERROR", "ts": "..." }
10. Known limitations¶
The implementation is intentionally scoped to a single factory-server instance per site and a trusted LAN between the server and its edge devices. The following limits are known and documented so you can plan around them:
- In-memory WebSocket registry:
DeviceConnectionRegistryis not replicated. Running two factory-server replicas will split the device-session set — a command issued against replica A for a device connected to replica B will fail withDevice offline. Fix: put a shared state layer (e.g. Redis / PostgreSQL LISTEN/NOTIFY) in front of the registry before horizontal scaling. - In-memory
InMemoryDeviceConfigStoreon the edge:push_configpersists for the lifetime of the JVM only. Triggeringrestart_serviceafterwards loses the changes unless you also write them through to/etc/okto/application.yaml. For durable config, either restart the service (so the YAML is re-read) or write aPersistentDeviceConfigStorethat maps the patch to the SQLiteconfigtable. - Config hot-reload: most running services (scanner, printer, modbus, cloud
client) capture their configuration at startup.
push_configwithout a follow-uprestart_servicemostly just records intent, it does not reconfigure the hardware stack in place. - WebSocket device JWT in query string: device tokens are passed as a URL query parameter. Production deployments should terminate TLS on a reverse proxy and strip the query parameter from its access logs, or switch the edge service to a first-message-authentication scheme (open the socket, then send the token as the first text frame before the server registers it).
- No JWT revocation: user logout invalidates the session row but the JWT
remains valid until its
expclaim. Rotateauth.jwtSecretto force-revoke all tokens; per-user revocation requires a blacklist you'd need to add. - Cloud auth token is static:
cloudSync.authTokenis a long-lived bearer. If your OKTO cloud tenant moves to OAuth / short-lived tokens, wrap FactoryCloudClient with an auth refresher. - Log retention:
device_logsgrows unbounded. Add a periodic cleanup job (e.g.DELETE FROM device_logs WHERE ts < NOW() - INTERVAL '30 days'). - Firmware signatures: the protocol carries an optional
signatureBase64field but verification isn't wired up by default. If you enable Ed25519 signing, bundle the trusted public key in the edge service JAR and add the check inUpdateFirmwareExecutorbefore the swap. - Privilege escalation: reboot/shutdown/firmware swap require the sudoers
file at packaging/sudoers/okto. Without it,
those commands will return non-zero exit codes. Docker containers without
systemd should rely on
restart_service+ supervisor restart instead.
11. Troubleshooting¶
- Device shows OFFLINE despite being powered on
- Confirm its connection mode is
VIA_LOCAL_SERVER. - Check
GET /api/v1/devices/connected— does the identifier appear? - Tail the edge-service log for "Connecting to factory server WS".
- Commands always TIMEOUT
- Either the edge-service is not online, or its
CommandHandlerServiceraised an uncaught exception. Inspectdevice_logsvia the Logs page. - Firmware deploy shows SUCCESS but device still reports old version
- The device only stages the artifact; the supervisor must swap the
JAR on its next restart. Trigger a
restart_servicecommand to force the swap on the next process start.
See also: - API_REFERENCE.md for the complete endpoint list. - DEPLOYMENT.md for production hardening guidance.