UNIHODL · Agent Handoff SDK · Technical Specification v1.0
Agent Handoff SDK
The decision-continuity protocol for human-to-agent and agent-to-agent work transfer.
Quickstart · 5 minutes
Install the SDK, mint a scoped key, and pass a UNIHODL resume token into your agent loop. An agent that receives a resume token starts its turn already knowing what the human decided, what they read, and what they intended to do next.
1. Install
npm install @unihodl/agent-sdk
# or
pip install unihodl-agent2. Mint a scoped resume token (server-side)
curl https://unihodl.app/api/v1/resume_tokens \
-H "Authorization: Bearer $UNIHODL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"session_id": "ses_8f3aZ91b",
"scopes": ["read:context", "read:reasoning"],
"ttl_seconds": 3600,
"audience": "claude.anthropic.com"
}'3. Hand it off to an agent
from unihodl_agent import Client
from anthropic import Anthropic
uh = Client(api_key=os.environ["UNIHODL_API_KEY"])
claude = Anthropic()
# Hydrate the session — returns a structured ResumeContext
ctx = uh.sessions.hydrate("ses_8f3aZ91b")
# Pass the serialized context as the first system message
resp = claude.messages.create(
model="claude-sonnet-4-7",
max_tokens=2048,
system=ctx.as_system_prompt(),
messages=[
{"role": "user", "content": "Continue Sarah's research on API v3."}
],
)
print(resp.content[0].text)What just happened
The agent received the human’s open tabs with scroll positions, the AI-tagged decision thread, partial conclusions, and intended next step — not just a list of links. It can pick up the work mid-thought.
1. Overview & Design Principles
Most agentic AI failures are not reasoning failures — they are cold-start failures. The agent begins each session with no idea what the human already concluded, what tabs they had open, or which option they were leaning toward. UNIHODL’s Agent Handoff SDK closes that gap by exposing the human’s working session as a first-class, machine-readable artifact.
The four primitives
| Primitive | Definition |
|---|---|
Session | An open work context — tabs, scroll positions, video timestamps, partial notes — captured by UNIHODL on the device. |
Resume Token | A signed, scoped, time-bound credential that grants an agent read (and optionally write) access to a session. |
Resume Context | The hydrated payload an agent receives — structured, redacted, serializable into any model's prompt format. |
Reasoning Thread | The why-layer: partial conclusions, decision stance, blockers, next-step intent. |
Design principles
- •Audience-bound by default. Every resume token is bound to a single audience (a model vendor, an MCP server URL, or a known agent identity). Tokens cannot be replayed against other audiences.
- •Redaction at the boundary. Sensitive content is redacted server-side before serialization based on per-workspace policies — the agent never sees content it isn't entitled to.
- •Reasoning is structured, not narrative. Decision threads are serialized as typed graph nodes, not free text, so agents can ground tool calls in specific intent vectors.
- •Format-pluggable. The same Resume Context can serialize to a Claude system prompt, a Gemini system instruction, an OpenAI Agents SDK input, or a raw MCP resource.
- •Auditable. Every hydration creates an immutable access record: who, when, what scopes, what was redacted, what was returned.
2. Resume Token Data Schema
A resume token is a JWS-signed JWT (RFC 7519) issued by api.unihodl.appand consumed by either UNIHODL’s hydration endpoint or, when scopes permit, directly by an agent that validates against UNIHODL’s JWKS.
Token claims
| Claim | Type | Description |
|---|---|---|
iss | string | Always https://api.unihodl.app — issuer. |
sub | string | Workspace-scoped subject, e.g. wks_3kQ:usr_8aZ. |
aud | string | Bound audience: model vendor or MCP server URL. |
jti | string | Token ID. Used to revoke. |
iat / nbf / exp | int | Standard JWT timing claims. exp ≤ iat + 86400. |
scope | string[] | Space-delimited capability list (see §5). |
session_id | string | Bound session, e.g. ses_8f3aZ91b. |
redaction_policy | string | Named policy applied during hydration. |
max_hydrations | int | Replay cap. 1 = single-use. |
delegator | object? | If issued via handoff, the human who consented. |
nonce | string? | Required for write-scoped tokens. |
Example token (decoded)
{
"iss": "https://api.unihodl.app",
"sub": "wks_3kQ:usr_8aZ",
"aud": "claude.anthropic.com",
"jti": "rt_01HW7KQ8X2N3C9V0F4R6PD",
"iat": 1746480000,
"nbf": 1746480000,
"exp": 1746483600,
"scope": ["read:context", "read:reasoning", "read:redacted"],
"session_id": "ses_8f3aZ91b",
"redaction_policy": "default-strict",
"max_hydrations": 5,
"nonce": null
}Resume Context payload (returned by /hydrate)
{
"session_id": "ses_8f3aZ91b",
"version": "1.0",
"captured_at": "2026-05-05T22:14:08Z",
"title": "GraphQL migration for API v3",
"summary": "Researching whether to migrate API v3 from REST to GraphQL.",
"tabs": [
{
"url": "https://blog.apollographql.com/rest-vs-graphql",
"title": "REST vs GraphQL — performance",
"scroll_y_pct": 0.62,
"selected_text": "GraphQL batching outperforms ...",
"tab_role": "primary_source"
}
],
"media": [
{ "kind": "video", "url": "youtu.be/abc", "timestamp_s": 1843,
"transcript_anchor": "...the unified resolver pattern ..." }
],
"reasoning_thread": [
{ "kind": "observation", "text": "REST batching wins on read-heavy endpoints." },
{ "kind": "decision_stance", "text": "Leaning hybrid; not full GraphQL." },
{ "kind": "blocker", "text": "Need finance approval for vendor B." },
{ "kind": "next_step", "text": "Draft hybrid RFC; meet with platform team." }
],
"ai_tags": ["api-v3", "graphql", "hybrid-architecture"],
"redactions": [{ "field": "tabs[2].selected_text", "reason": "PII" }]
}3. Serialization, Compression & Redaction
Resume Contexts are designed to be small enough to fit comfortably in a 200K-token window alongside the agent’s task, structured enough to reason over, and safe enough to send across trust boundaries. The pipeline is deterministic: capture → normalize → redact → compress → serialize.
Wire formats
| Format | MIME type | When to use |
|---|---|---|
| Canonical JSON | application/vnd.unihodl.context+json | Default. Best for inspection, debugging, and most agent inputs. |
| Compact CBOR | application/vnd.unihodl.context+cbor | Mobile clients, low-bandwidth handoffs, edge MCP servers. |
| Prompt-ready text | text/x-unihodl-prompt | Pre-rendered for direct concatenation into a system prompt. |
| MCP resource | application/vnd.mcp.resource+json | When the consumer is an MCP-capable agent. |
Compression
Payloads larger than 32 KB are compressed with zstd level 9 and content-encoded (Content-Encoding: zstd). Clients that send Accept-Encoding: zstd always receive compressed payloads. For prompt-ready text the SDK transparently decompresses before serialization.
Redaction model
Redaction runs on the server during hydration. Policies are named, versioned, and bound to tokens via the redaction_policy claim. A policy is a list of typed rules applied in order. Field-level redactions are reflected in resume_context.redactions[] so the agent knows what was removed and why.
# default-strict policy (excerpt)
- match: { kind: "selected_text", regex: "\\b\\d{3}-\\d{2}-\\d{4}\\b" }
action: { redact: true, reason: "PII:SSN" }
- match: { kind: "tab_url", host_in: ["banking.*", "payroll.*"] }
action: { drop: true, reason: "domain:financial" }
- match: { kind: "reasoning_thread", contains_class: "PROTECTED_HEALTH" }
action: { redact: true, reason: "PHI" }
- match: { kind: "ai_tags", value_in: ["confidential"] }
action: { drop_all_with_tag: true }Redaction is auditable
Each /hydrate call produces an audit_record with a hash of the unredacted payload, the policy version, and the list of fields removed. The unredacted payload itself is never logged.
Determinism & versioning
The serializer guarantees byte-stable output for any given (session_version, policy_version, format). The SDK exposes unihodl.context.fingerprint(ctx) — a SHA-256 over the canonical JSON form — so callers can cache hydrations and detect drift.
4. REST + MCP API Design
All HTTP endpoints are versioned at /v1/* and served from https://unihodl.app/api in v0 (with future graduation to https://api.unihodl.app). Every endpoint accepts and returns JSON unless otherwise noted. The MCP server speaks the Model Context Protocol natively over stdio in v0 and will be hosted at mcp.unihodl.app in v1.
Endpoint reference
| Method | Path | Purpose | Auth |
|---|---|---|---|
POST | /v1/resume_tokens | Mint a scoped resume token bound to a session and audience. | API key |
GET | /v1/sessions/{id} | Inspect session metadata. Does not return content. | API key |
POST | /v1/sessions/{id}/hydrate | Hydrate a session into a Resume Context. Requires resume token. | Resume token |
POST | /v1/sessions/{id}/handoff | Transfer a session to another principal with consent. | API key + step-up |
POST | /v1/sessions/{id}/notes | Append an agent-authored note (write scope required). | Resume token (write) |
POST | /v1/resume_tokens/{jti}/revoke | Revoke a token before exp. Idempotent. | API key |
GET | /.well-known/jwks.json | Public keys for token validation. | Public |
MCP | mcp.unihodl.app | Tools: hold, resume, hand_off; Resources: unihodl://session/{`{id}`}. | Resume token |
MCP surface
UNIHODL exposes itself as a first-class MCP server. Agents that already speak MCP (Claude, Cursor, Cline, Continue, custom MCP clients) can mount UNIHODL with no other code:
{
"mcpServers": {
"unihodl": {
"command": "npx",
"args": ["-y", "@unihodl/mcp-server"],
"env": { "UNIHODL_API_KEY": "uh_live_..." }
}
}
}5. Authentication & Permissioning
UNIHODL uses a two-tier credential model. API keys identify a workspace and are used by trusted backend code to mint resume tokens, which are short-lived, scoped, audience-bound credentials sent to agents.
Credential types
| Type | Format | Lifetime | Use |
|---|---|---|---|
| API key (live) | uh_live_… | Until rotated | Server-side, mints tokens, never to agents. |
| API key (test) | uh_test_… | Until rotated | Sandbox environment, no real sessions. |
| Resume token | eyJhbGc… | ≤ 24h, 1h default | Sent to agents/MCP clients. Audience-bound. |
| OAuth (delegated) | Bearer (RFC 6749) | Configurable | Third-party apps acting on behalf of users. |
Scope catalog
| Scope | Grants |
|---|---|
read:context | Tabs, scroll positions, media timestamps. |
read:reasoning | The reasoning_thread (intent, blockers, next step). |
read:redacted | Surfaces redacted-but-acknowledged metadata (field paths only). |
read:transcript | Full media transcripts when present. |
write:notes | Append agent-authored notes to a session. |
write:next_step | Mutate the next_step intent on the reasoning thread. |
session:hand_off | Initiate a handoff to another principal (requires step-up). |
Step-up authentication
Mutating endpoints (handoff, write:next_step) require the human to confirm in-app within the last 5 minutes. The SDK exposes session.requireStepUp() to trigger the prompt.
6. Human Reasoning State Model
Most context-passing schemes lose the why. Tab lists tell an agent what the human looked at, not what they were thinking. UNIHODL’s reasoning thread is a typed graph of nodes captured during HOLD by the on-device tagger, optionally enriched server-side, and serialized for agent consumption.
Node taxonomy
| kind | Meaning |
|---|---|
observation | A factual datum extracted from a source (tab, video, transcript). |
partial_conclusion | A claim the human is forming but has not committed to. |
decision_stance | Where the human currently leans on a pending decision. |
blocker | An unresolved dependency stopping forward motion. |
question | An open question the human is investigating. |
next_step | The intended next action — agents should ground tool calls here. |
reference | A pointer back into tabs[] or media[] for evidence. |
Wire format
{
"reasoning_thread": [
{
"id": "rn_01",
"kind": "observation",
"text": "REST batching wins on read-heavy endpoints.",
"evidence": ["tabs[0]", "tabs[2]"],
"confidence": 0.78
},
{
"id": "rn_02",
"kind": "partial_conclusion",
"text": "Pure GraphQL is overkill for our workload.",
"supports": ["rn_01"],
"confidence": 0.66
},
{
"id": "rn_03",
"kind": "decision_stance",
"text": "Leaning hybrid: GraphQL for write paths, REST for hot reads.",
"supports": ["rn_02"]
},
{
"id": "rn_04",
"kind": "blocker",
"text": "Need finance approval for vendor B's cap on Tier-2 throughput."
},
{
"id": "rn_05",
"kind": "next_step",
"text": "Draft hybrid RFC; meet platform team Thursday.",
"depends_on": ["rn_03"]
}
]
}Prompt-ready serialization
When X-Unihodl-Format: prompt-ready is requested, the reasoning thread is rendered as a structured natural-language block that LLMs reliably parse:
## CONTEXT (UNIHODL session ses_8f3aZ91b · 2026-05-05)
The user is researching: "GraphQL migration for API v3."
What they have concluded:
• REST batching wins on read-heavy endpoints. (confidence 0.78)
• Pure GraphQL is overkill for our workload. (confidence 0.66)
Where they currently lean:
→ Hybrid: GraphQL for write paths, REST for hot reads.
Open blockers:
! Finance approval for vendor B's Tier-2 cap.
Intended next step:
→ Draft hybrid RFC; meet platform team Thursday.
Continue from here.7. Reference: Anthropic Claude
Claude integrates with UNIHODL in two flavors: (a) system prompt injection — the simplest path, hydrate and pass the prompt-ready text as the system prompt — or (b) MCP server mounting, where Claude’s tool loop calls UNIHODL on demand.
Mode A — System prompt injection
import os
from unihodl_agent import Client
from anthropic import Anthropic
uh = Client(api_key=os.environ["UNIHODL_API_KEY"])
claude = Anthropic()
# Mint an audience-bound, single-use token for this turn.
token = uh.resume_tokens.create(
session_id="ses_8f3aZ91b",
audience="claude.anthropic.com",
scopes=["read:context", "read:reasoning"],
ttl_seconds=600,
max_hydrations=1,
)
# Hydrate; ask for prompt-ready format.
ctx = uh.sessions.hydrate(
session_id="ses_8f3aZ91b",
token=token,
format="prompt-ready",
)
resp = claude.messages.create(
model="claude-sonnet-4-7",
max_tokens=4096,
system=ctx.text,
messages=[
{"role": "user", "content": "Continue Sarah's research and draft the hybrid RFC."}
],
)
print(resp.content[0].text)Mode B — MCP server mount
Claude Desktop and Claude API both speak MCP. Mount UNIHODL once, then Claude can hold, resume, and hand off sessions on its own.
8. Reference: Google Gemini
Gemini integrates via function calling on the google-genaiSDK. Declare UNIHODL’s resume tool, register it with the model, and Gemini will call it when its reasoning needs the human’s prior context.
9. Reference: OpenAI Agents SDK & LangGraph
The OpenAI Agents SDK supports first-class tools. Wrap UNIHODL hydration as a tool and pass it to your agent — the agent decides when to fetch context. In LangGraph, hydrate UNIHODL once at graph entry and inject the Resume Context into the shared state.
Cross-framework guarantee
The Resume Context schema is identical across all integrations. An agent built on Claude can hand a session off to a Gemini-based agent, and the receiving agent sees the same fields, the same reasoning thread, the same evidence pointers.
10. Errors, Rate Limits & Versioning
Error envelope
HTTP/1.1 403 Forbidden
Content-Type: application/json
{
"error": {
"code": "audience_mismatch",
"message": "Token audience claude.anthropic.com does not match request origin.",
"audit_id": "aud_01HW7KQ9...",
"hint": "Mint a new resume_token with the correct aud.",
"doc_url": "https://unihodl.app/sdk/spec#errors"
}
}Error codes
| code | HTTP | Description |
|---|---|---|
invalid_token | 401 | Signature, exp, or nbf failed verification. |
audience_mismatch | 403 | Token aud does not match calling origin. |
scope_insufficient | 403 | Required scope is not present on the token. |
session_not_found | 404 | session_id is unknown or has been deleted. |
hydration_exhausted | 429 | max_hydrations cap reached for this token. |
rate_limited | 429 | Per-workspace or per-token rate limit exceeded. |
redaction_failed | 422 | Redaction policy could not be applied; see hint. |
step_up_required | 401 | Mutating call requires recent in-app human confirmation. |
payload_too_large | 413 | Estimated context exceeds requested max_tokens. |
Rate limits
| Endpoint group | Limit | Notes |
|---|---|---|
| Token mint | 300 / minute / workspace | Bursts to 600; enforced via leaky bucket. |
| Hydration | 60 / minute / token | Independent of token's max_hydrations cap. |
| Hydration | 10,000 / day / workspace | Enterprise plans negotiate higher limits. |
| Webhook delivery | 20 / second / endpoint | Beyond this, UNIHODL backs off exponentially. |
Versioning
API path is the major version (/v1). Schema evolutions ship as additive minor versions surfaced in the version field of every payload. SDKs follow semver. Breaking changes require a 12-month deprecation window and a new path prefix.
Continue building. Sample apps and OpenAPI spec land at unihodl.app/sdk. Spec questions: developers@unihodl.app.
Open gaps tracked at /sdk/roadmap.