Spec 2: Knowledge Schema

DWS Spec 2: Context & Knowledge Schema

Digital Worker Standard — DWS Specification

Version: 1.0 Tier: 1 — Core Primitive Status: Release Candidate Dependencies: Spec 1 (Worker Identity)

1. Overview

This specification defines the knowledge model for DWS-compliant systems. It establishes three distinct layers of knowledge that workers access during execution, defines the schema for individual knowledge entries, specifies conflict resolution rules when knowledge sources contradict, and describes the full lifecycle of knowledge from creation through retirement.

Knowledge is the mechanism by which DWS systems compound learning across sessions. Without structured knowledge, every worker session starts from zero. With it, organisations build institutional memory that makes each subsequent execution more informed than the last.

Key Terms

Knowledge Entry: A single unit of structured knowledge with typed metadata.
Knowledge Layer: One of three categories of knowledge, distinguished by scope and persistence.
Conflict: A state where two or more knowledge entries within scope make contradictory assertions.
Decay: The reduction in effective confidence of a knowledge entry over time without renewal.

2. Knowledge Layer Taxonomy

DWS defines three knowledge layers. Each layer has distinct persistence characteristics, authorship models, and scoping rules.

2.1 Session Context

Session context is scoped to a single workflow execution. It is ephemeral by default and MUST NOT persist beyond the session boundary unless explicitly promoted to institutional knowledge.

Session context includes:

The intent artifact(s) being executed
Worker assignments and their current state
Artifacts produced during this session
Intermediate results and decision records

Session Context Envelope Schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["session_id", "workflow_id", "started_at", "intent_refs", "loaded_entries"],
  "properties": {
    "session_id": {
      "type": "string",
      "format": "uuid"
    },
    "workflow_id": {
      "type": "string"
    },
    "started_at": {
      "type": "string",
      "format": "date-time"
    },
    "intent_refs": {
      "type": "array",
      "items": { "type": "string" },
      "description": "IDs of intent artifacts active in this session."
    },
    "loaded_entries": {
      "type": "array",
      "items": { "type": "string" },
      "description": "IDs of institutional and domain knowledge entries loaded at session start."
    },
    "excluded_entries": {
      "type": "array",
      "items": { "type": "string" },
      "description": "IDs of entries explicitly excluded from this session's scope."
    },
    "worker_assignments": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "worker_id": { "type": "string" },
          "role": { "type": "string" },
          "phase_id": { "type": "string" }
        }
      }
    }
  }
}

Runtimes MUST load session context before worker execution begins. Workers MUST NOT access knowledge entries that are not present in loaded_entries or produced during the current session.

2.2 Institutional Knowledge

Institutional knowledge persists across sessions and represents the compounding layer of DWS. These are the patterns, decisions, failure modes, and conventions that an organisation accumulates as its workers execute work.

Institutional knowledge entries are:

Organisation-specific. They reflect how this organisation works, not universal truths.
Worker-authored or human-authored. Both sources are valid; the author field distinguishes them.
Scoped. An entry relevant to marketing workers SHOULD NOT be loaded into engineering worker sessions unless explicitly requested.

Institutional knowledge entries MUST be indexed by:

type (entry type taxonomy, see Section 3)
scope (which workers, domains, or workflows the entry applies to)
tags (freeform labels for filtering)
created_at and updated_at timestamps

Runtimes SHOULD support retrieval by any combination of these indices.

2.3 Domain Knowledge

Domain knowledge is reference material external to the organisation: documentation, codebases, API specifications, datasets, regulatory texts, and similar resources.

Domain knowledge is:

Not organisation-authored. It originates outside the DWS system.
Versioned. References MUST include a version identifier (commit hash, document version, or retrieval timestamp).
Read-only within DWS. Workers consume domain knowledge but do not modify the source.

{
  "type": "object",
  "required": ["ref_id", "source_type", "uri", "version"],
  "properties": {
    "ref_id": { "type": "string" },
    "source_type": {
      "type": "string",
      "enum": ["documentation", "codebase", "api_spec", "dataset", "regulatory", "other"]
    },
    "uri": { "type": "string", "format": "uri" },
    "version": { "type": "string" },
    "description": { "type": "string" },
    "scope": {
      "type": "array",
      "items": { "type": "string" }
    },
    "last_verified": { "type": "string", "format": "date-time" }
  }
}

3. Knowledge Entry Schema

A knowledge entry is the atomic unit of structured knowledge in DWS.

3.1 Entry Type Taxonomy

Type	Description
`decision`	A specific choice made during execution, with rationale.
`pattern`	A recurring observation across multiple sessions or executions.
`constraint`	A rule or limitation that workers must respect.
`convention`	An agreed practice or standard within the organisation.
`failure_mode`	A documented way in which something has gone wrong.
`lesson`	An insight derived from experience, typically from verification findings.
`reference`	A pointer to external information relevant to future sessions.

3.2 Knowledge Entry Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["entry_id", "type", "content", "source", "confidence", "scope", "created_at", "author"],
  "properties": {
    "entry_id": { "type": "string", "format": "uuid" },
    "type": {
      "type": "string",
      "enum": ["decision", "pattern", "constraint", "convention", "failure_mode", "lesson", "reference"]
    },
    "content": {
      "type": "string",
      "description": "Human-readable description of the knowledge entry."
    },
    "structured_content": {
      "type": "object",
      "description": "Machine-readable structured representation. Schema varies by type."
    },
    "data_classification": {
      "type": "string",
      "enum": ["public", "internal", "confidential", "restricted"],
      "default": "internal",
      "description": "Sensitivity classification of this knowledge entry. Governs export, sharing, and retention behaviour. See Spec 13."
    },
    "source": {
      "type": "object",
      "required": ["source_type", "source_ref"],
      "properties": {
        "source_type": {
          "type": "string",
          "enum": ["session_event", "human_input", "verification_finding", "promotion", "import"]
        },
        "source_ref": { "type": "string" },
        "session_id": { "type": "string", "format": "uuid" }
      }
    },
    "confidence": {
      "type": "number",
      "minimum": 0.0,
      "maximum": 1.0
    },
    "scope": {
      "type": "object",
      "properties": {
        "workers": {
          "type": "array",
          "items": { "type": "string" },
          "description": "Worker roles or IDs this entry applies to. Empty array means all workers."
        },
        "domains": {
          "type": "array",
          "items": { "type": "string" }
        },
        "workflows": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "created_at": { "type": "string", "format": "date-time" },
    "updated_at": { "type": "string", "format": "date-time" },
    "author": {
      "type": "object",
      "required": ["author_type", "author_id"],
      "properties": {
        "author_type": {
          "type": "string",
          "enum": ["human", "worker"]
        },
        "author_id": { "type": "string" }
      }
    },
    "expiry": { "type": ["string", "null"], "format": "date-time" },
    "superseded_by": { "type": ["string", "null"] },
    "evidence": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "evidence_type": {
            "type": "string",
            "enum": ["verification_finding", "event", "human_attestation"]
          },
          "ref": { "type": "string" }
        }
      }
    },
    "tags": { "type": "array", "items": { "type": "string" } },
    "decay": {
      "type": "object",
      "properties": {
        "model": {
          "type": "string",
          "enum": ["none", "linear", "exponential"],
          "default": "none"
        },
        "half_life": {
          "type": "string",
          "description": "ISO 8601 duration. Only applicable when model is not 'none'."
        }
      }
    },
    "status": {
      "type": "string",
      "enum": ["active", "promoted", "retired", "superseded"],
      "default": "active"
    }
  }
}

3.3 Weighting and Decay

Knowledge entries MAY be configured to lose effective confidence over time. This prevents stale entries from carrying the same weight as recently validated ones.

Effective confidence is calculated as:

No decay (none): effective_confidence = confidence (constant).
Linear decay: effective_confidence = confidence * max(0, 1 - (elapsed / (2 * half_life))).
Exponential decay: effective_confidence = confidence * 0.5^(elapsed / half_life).

Where elapsed is the duration since updated_at (or created_at if never updated) and half_life is the configured duration.

Entries whose effective confidence falls below a configurable threshold (default: 0.1) are candidates for retirement. Runtimes SHOULD flag these entries during session loading rather than silently excluding them.

Renewal: Any event that references a knowledge entry as relevant or accurate resets its updated_at timestamp, effectively renewing its decay clock.

4. Conflict Resolution Rules

4.1 Precedence Hierarchy

When knowledge from different sources conflicts, the following precedence hierarchy applies (highest to lowest):

Explicit intent — Constraints and directives in the active intent artifact.
Institutional knowledge — Organisation-accumulated entries.
Domain knowledge — External reference material.
Worker defaults — Built-in behaviours from the worker definition.

A higher-precedence source MUST override a lower-precedence source when they conflict. Within the same precedence level, the more recently updated entry takes precedence.

4.2 Conflict Detection

A conflict exists when two or more knowledge entries within the same scope make assertions that cannot simultaneously be true.

Runtimes MUST check for conflicts when:

Loading knowledge entries at session start
A new knowledge entry is created during execution
A worker queries knowledge entries with overlapping scope

The determination of whether content is contradictory is implementation-specific. Runtimes MAY use semantic comparison, keyword matching, or structured field comparison. The spec requires that a conflict detection mechanism exists, not a specific algorithm.

4.3 Resolution Logging

All conflicts and their resolutions MUST produce events conforming to Spec 11 (Events & Telemetry).

5. Knowledge Lifecycle

5.1 Ingestion

Knowledge entries are created through three primary channels:

From events. During execution, workers produce events (Spec 11). Runtimes or post-processing pipelines MAY extract knowledge entries from event streams.
From human input. Humans MAY create knowledge entries directly, typically constraint or convention entries that encode organisational rules.
From verification findings. Verification results (Spec 8) that identify patterns or recurring issues SHOULD be candidates for knowledge entry creation.

Ingestion MUST produce a knowledge.entry_created event (Spec 11).

5.2 Promotion

Promotion is the process by which a session-scoped observation becomes institutional knowledge.

{
  "type": "object",
  "required": ["min_occurrences", "min_confidence", "min_sessions"],
  "properties": {
    "min_occurrences": { "type": "integer", "minimum": 1, "default": 3 },
    "min_confidence": { "type": "number", "minimum": 0.0, "maximum": 1.0, "default": 0.7 },
    "min_sessions": { "type": "integer", "minimum": 1, "default": 2 },
    "requires_human_approval": { "type": "boolean", "default": false }
  }
}

Promotion criteria are configurable per organisation. The defaults above are RECOMMENDED starting points.

5.3 Retirement

Entries are retired when they are no longer relevant:

The entry’s expiry timestamp has passed.
The entry has been superseded_by another entry.
The entry’s effective confidence (after decay) falls below the retirement threshold.
A human explicitly retires the entry.

Retired entries MUST NOT be loaded into new sessions. They SHOULD be retained in storage for audit purposes (see Spec 12 for retention requirements).

5.4 Export and Import

Knowledge entries MUST be exportable as a JSON array conforming to the entry schema defined in Section 3.2.

Runtimes MAY build implementation-specific indexes, graphs, or embeddings on top of knowledge entries. These derived structures are not part of the spec and are not portable. The raw entries are the interchange format.

Marketplace portability: When a worker definition is published to a marketplace (or transferred between organisations), its associated knowledge entries may be exported alongside it. The data_classification field governs which entries are eligible for export:

public — exportable without restriction.
internal — exportable within the organisation only.
confidential — not exportable. Must be regenerated by the receiving organisation.
restricted — not exportable. Must be explicitly recreated with appropriate authorisation.

5.5 Knowledge Transfer Between Worker Versions

When a worker is upgraded (e.g. from v1.0.0 to v2.0.0), institutional knowledge accumulated by the previous version is not automatically discarded. The runtime SHOULD:

Carry forward all knowledge entries whose scope matches the new worker version.
Flag entries that reference skills, workflows, or capabilities removed in the new version.
Emit a knowledge.version_transfer event listing carried-forward and flagged entries.
Allow the new worker version to confirm or retire carried-forward entries during its first execution.

6. Versioning Model

Knowledge entries are runtime artifacts. They are event-sourced, not git-backed.

6.1 Runtime Event Sourcing

Every state change to a knowledge entry produces an event:

knowledge.entry_created
knowledge.entry_promoted
knowledge.entry_retired
knowledge.entry_overridden

The complete history of any knowledge entry can be reconstructed by replaying its event stream.

6.2 Git-Committable Snapshots

Knowledge export (Section 5.4) produces a JSON snapshot that CAN be committed to git for portability. This is a deliberate export operation, not the native storage model.

6.3 Reconciliation

The dws reconcile tool MAY propose committing promoted knowledge entries as part of a worker configuration update:

Worker executes work, producing events.
Events generate knowledge entries.
Entries are promoted to institutional knowledge.
dws reconcile proposes a git commit incorporating promoted entries into the worker configuration.
A human reviews and commits.

This is a deliberate human act. Runtime knowledge does not auto-commit.

7. Key Design Decisions

Resolved

Decision	Resolution	Rationale
Knowledge entry structure level	Moderate structure with typed entries and required fields, plus freeform `structured_content`	Too rigid breaks diverse domains. Too loose prevents meaningful indexing.
Confidence scoring model	Spec requires a `confidence` field (0.0-1.0) but does not mandate a scoring algorithm	Confidence sources vary too widely to standardise the algorithm. The field ensures comparability.
Knowledge vs. knowledge graph	The spec defines entries and their schema. Graphs, embeddings, and indexes are implementation-specific.	The entry is the portable primitive. How entries are connected is where implementations differentiate.

8. References

OpenTelemetry Semantic Conventions — Structural model for typed, scoped metadata.
Spec 1: Worker Identity — Defines worker roles and IDs referenced in knowledge entry scope and author fields.
Spec 8: Verification Framework — Verification findings are a primary source of knowledge entries.
Spec 11: Events & Telemetry — All knowledge lifecycle events are emitted to the event stream.
Spec 12: Compliance & Governance — Governs data retention requirements for knowledge entries.
Spec 13: Security Model — Defines data classification levels referenced by data_classification field.