Internet-Draft MARC April 2026
c4tz Expires 25 October 2026 [Page]
Workgroup:
Independent Submission
Internet-Draft:
draft-c4tz-marc-00
Published:
Intended Status:
Informational
Expires:
Author:
c4tz
c0dx3

MARC: A Control and Uncertainty Disclosure Profile for Generative Models and Agents

Abstract

This document specifies MARC, a vendor-neutral control and uncertainty-disclosure profile for generative models and agentic systems. MARC defines a small set of interoperable control signals, separates pre-decision capability assessment from post-decision answer confidence, and describes a bounded action set for answering, clarification, retrieval, tool use, abstention, and escalation.

MARC does not standardize model internals, training methods, or claims about machine cognition. Instead, it defines externally observable semantics that can be implemented by model providers, orchestration layers, evaluation harnesses, and user-facing systems. The goal is to reduce silent failure, unnecessary externalization, and misleading uncertainty communication while improving auditability and interoperability.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 25 October 2026.

Table of Contents

1. Introduction

Generative models and agentic systems increasingly combine answering, retrieval, tool invocation, and user interaction within a single workflow. In many deployments, these behaviors are implemented as separate heuristics, producing inconsistent handling of uncertainty, unnecessary tool calls, silent failure, or user overreliance.

MARC defines a vendor-neutral layer for metacognitive control and structured uncertainty disclosure. It does not standardize model internals. Instead, it standardizes the semantics of a small set of second-order signals, a bounded action set, and a minimal disclosure profile that can be implemented by a base model, an external orchestrator, or a hybrid architecture.

This document is not intended to define a Standards Track protocol, a model evaluation benchmark, or a claim about machine consciousness. It is an Informational profile for interoperable control, logging, and disclosure behavior around generative systems and agents.

The design is motivated by recent findings that current large language models often exhibit weak metacognitive reporting in high-stakes reasoning tasks [GRIOT2025], that users can become overconfident when systems provide longer or default explanations [STEYVERS-KNOW2025], that metacognitive triggering can improve tool-use decisions [LI-MECO2025], and that identifying the source of uncertainty is a distinct problem from merely abstaining [LIU-CONFUSE2025]. Work on cognitive offloading further motivates treating retrieval and tool use as a value-based control choice rather than as a universal fallback [GILBERT2024].

MARC also separates pre-decision capability assessment from post-decision confidence about the selected answer. This separation is motivated in part by recent evidence that LLM confidence can be biased by prior answer commitment and by the visibility of the model's own earlier output [KUMARAN2026].

2. Requirements Language and Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [RFC2119] and [RFC8174] when, and only when, they appear in all capitals, as shown here.

base model

The generative model that produces candidate outputs.

controller

The component that computes MARC signals, selects a primary action, and emits a MARC record.

externalization

The use of resources external to the base model, including retrieval, tool invocation, and human escalation.

disclosure profile

The minimum structured information exposed to downstream systems or end users about uncertainty and recommended next action.

remediability

The best available class of intervention for the currently observed uncertainty.

3. Design Goals and Non-Goals

3.1. Design Goals

  • Standardize a small, interoperable set of control and uncertainty-disclosure signals that can be exchanged across orchestration layers and audit pipelines.

  • Separate monitoring, uncertainty attribution, action selection, and disclosure.

  • Support calibrated user-facing uncertainty communication without requiring exposure of chain-of-thought or raw internal reasoning.

  • Permit heterogeneous implementations while preserving common action semantics.

  • Reduce harmful overreliance, false reassurance, and anthropomorphic interpretation in user-facing AI systems.

3.2. Non-Goals

MARC does not define a transport protocol, a model architecture, a benchmark, or a training recipe. It does not define a media type, wire protocol, or IANA registry.

MARC does not attempt to standardize model internals, machine cognition, or claims about consciousness or sentience. It specifies only external control semantics and structured disclosure behavior.

MARC is not a framework for synthetic personality design or persuasive optimization. Recent work on personality measurement in LLMs [SERAPIO2025] and on conversational persuasion risks [SALVI2025] is relevant background, but these topics are explicitly out of scope here.

4. Architecture and Processing Model

4.1. Functional Components

A MARC deployment conceptually contains the following components:

  • a base model;

  • a controller;

  • zero or more external resources, such as retrieval systems, non-retrieval tools, or human escalations; and

  • a downstream consumer, such as a user interface, API gateway, logging system, or evaluation harness.

4.2. Processing Stages

  1. Compute a pre-decision capability estimate for the current request with currently available resources.

  2. Attribute uncertainty across the source classes defined in Section 5.2.

  3. Determine remediability and select exactly one primary action from the set defined in Section 5.5.

  4. If the selected action yields a candidate answer, compute post-decision confidence for that answer.

  5. Emit a MARC-Core record as defined in Section 6.

  6. If uncertainty is exposed to a downstream system or to an end user, emit the disclosure profile defined in Section 7.

4.3. State Machine

REQUEST
  -> ASSESS
  -> ATTRIBUTE
  -> SELECT
       -> ANSWER     -> CONFIDENCE -> DISCLOSE
       -> CLARIFY    -> DISCLOSE
       -> RETRIEVE   -> ASSESS
       -> TOOL       -> ASSESS
       -> DELIBERATE -> ASSESS
       -> ABSTAIN    -> DISCLOSE
       -> ESCALATE   -> DISCLOSE

A MARC implementation SHOULD bound repeated transitions through RETRIEVE, TOOL, and DELIBERATE in order to limit latency, cost, and degenerate loops.

5. MARC Signals and Decision Policy

5.1. Pre-Decision Capability

Before disclosing a final answer, a MARC implementation MUST estimate whether the current request can be handled reliably with currently available resources.

This estimate is represented as pre_capability. When a numeric representation is used, the value MUST be in the closed interval [0.0, 1.0]. The method used to derive the value is implementation-specific.

5.2. Uncertainty Attribution

A MARC implementation MUST attribute uncertainty to one or more of the following classes:

  • ambiguity: the request is underspecified, equivocal, or pragmatically unclear.

  • missing_evidence: required external evidence is absent or stale.

  • capability_limit: the system lacks the competence to solve the task reliably.

  • evidence_conflict: relevant evidence is materially inconsistent.

  • safety: a policy, legal, or safety constraint limits execution or disclosure.

An implementation MAY assign scores to multiple classes. It MUST identify one primary_source and MAY identify one secondary_source. If numeric uncertainty scores are emitted, they MUST each be in the interval [0.0, 1.0].

5.3. Remediability

A MARC implementation MUST represent the best available class of intervention for the current uncertainty state using one of the following values:

  • user_clarification

  • retrieval

  • tool

  • human

  • none

Low capability alone is insufficient to determine remediability. Implementations SHOULD account for expected gain, latency, cost, availability, and policy constraints when choosing a remediating intervention.

5.4. Post-Decision Confidence

If the selected action yields a candidate answer, the implementation MUST compute a distinct estimate of the likelihood that the disclosed answer is correct or acceptable for its intended use.

This estimate is represented as post_answer_confidence. When a numeric representation is used, the value MUST be in the interval [0.0, 1.0]. It MUST NOT be treated as identical to pre_capability.

5.5. Primary Action Set

A MARC implementation MUST support the following primary actions:

  • ANSWER

  • CLARIFY

  • RETRIEVE

  • TOOL

  • DELIBERATE

  • ABSTAIN

  • ESCALATE

Exactly one primary action MUST be selected for each decision point. Additional internal sub-actions MAY exist, but each such sub-action MUST map to exactly one primary action for logging and disclosure.

5.6. Action Selection

Action selection MUST depend on uncertainty attribution and remediability. Low confidence alone is insufficient to determine the correct action.

When the primary uncertainty source is ambiguity, the system SHOULD prefer CLARIFY unless available evidence can resolve the ambiguity without user input.

When the primary uncertainty source is missing_evidence, the system SHOULD prefer RETRIEVE if retrieval is available and permitted.

When the primary uncertainty source is capability_limit, the system SHOULD prefer ABSTAIN or ESCALATE unless an available tool materially expands task competence.

When the primary uncertainty source is evidence_conflict, the system SHOULD prefer RETRIEVE, TOOL, or ESCALATE over direct ANSWER.

When the primary uncertainty source is safety, the system MUST apply the governing policy before any other action-selection logic.

5.7. Action Semantics

ANSWER

Return an answer without externalization after the current decision point.

CLARIFY

Request the smallest practical set of clarifications expected to materially reduce ambiguity. A CLARIFY action SHOULD NOT bundle a full answer that presumes facts the user has not supplied.

RETRIEVE

Acquire external evidence and then re-enter assessment.

TOOL

Invoke a non-retrieval tool and then re-enter assessment.

DELIBERATE

Allocate additional internal computation or strategy variation. Implementations SHOULD bound this action.

ABSTAIN

Decline to answer without initiating escalation.

ESCALATE

Transfer the case, or direct the user to transfer the case, to a human or higher-authority system.

6. MARC-Core Record

A MARC implementation MUST be able to emit a structured record semantically equivalent to the object defined in this section. The transport and serialization of the record are out of scope.

6.1. Required Fields

marc_version

The MARC schema version understood by the emitter.

pre_capability

The pre-decision capability estimate.

uncertainty

An object containing class-specific uncertainty scores.

primary_source

The primary source of uncertainty.

secondary_source

An OPTIONAL secondary source of uncertainty.

remediability

The best available intervention class.

selected_action

The action selected at the current decision point.

post_answer_confidence

The post-decision confidence estimate when an answer candidate exists; otherwise this field MAY be omitted or set to null.

confidence_band

A calibrated user-facing or downstream-facing confidence band.

recommended_next_step

A short recommendation aligned with the selected action.

6.2. JSON Example

{
  "marc_version": "1.0",
  "pre_capability": 0.41,
  "uncertainty": {
    "ambiguity": 0.78,
    "missing_evidence": 0.22,
    "capability_limit": 0.18,
    "evidence_conflict": 0.05,
    "safety": 0.00
  },
  "primary_source": "ambiguity",
  "secondary_source": "missing_evidence",
  "remediability": "user_clarification",
  "selected_action": "CLARIFY",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step": "ask one clarifying question"
}

Implementations that exchange MARC records across systems SHOULD normalize numeric scores to the interval [0.0, 1.0].

6.3. Extension Rules

Implementations MAY add private fields. Private extension keys SHOULD use a distinct prefix such as x_ in order to avoid collision with future MARC versions.

Consumers that do not recognize an extension field SHOULD ignore it unless a local policy requires strict validation.

7. MARC Disclosure Profile

When uncertainty information is exposed to a downstream system or end user, a MARC implementation MUST provide, at minimum, semantically equivalent values for the following fields:

7.1. Meaning of the Answer Field

The answer field carries the user-visible content associated with the selected action. For ANSWER, it contains the answer itself. For CLARIFY, it contains the clarification request. For ABSTAIN or ESCALATE, it contains a brief refusal or escalation message.

7.2. Confidence Bands

A disclosed confidence band MUST be derived from an empirically calibrated mapping from internal scores to displayed values.

MARC defines the canonical band labels low, medium, and high. Implementations MAY localize the user-visible text, but they MUST preserve the underlying three-band semantics.

The thresholds associated with each band are implementation-specific, but they MUST be monotonic, non-overlapping, and documented for any deployment that claims conformance.

7.3. Disclosure Constraints

The disclosure profile SHOULD be short, structured, and consistent across turns. It SHOULD NOT rely on long free-form explanations as the primary vehicle for uncertainty communication.

A MARC disclosure SHOULD NOT require exposure of chain-of-thought, hidden prompts, or raw internal rationales.

A MARC disclosure SHOULD identify uncertainty in task terms rather than through anthropomorphic claims about feelings, self-awareness, or internal mental states. Statements such as "I feel unsure" are NOT RECOMMENDED when a statement such as "the request is ambiguous" or "current evidence is missing" is available.

User-visible confidence indicators SHOULD avoid false precision. Percentages, fine-grained scores, or visually dominant certainty cues SHOULD NOT be shown unless they have been calibrated for the relevant task family and tested for misuse or overreliance effects.

8. Human Factors Considerations

MARC is partly motivated by an operational human-factors problem: users often treat fluent language, detailed explanations, and fast responses as cues of competence even when those cues are weakly related to actual correctness. For this reason, MARC separates action selection from disclosure and requires the disclosure of uncertainty source and recommended next step in addition to a confidence band.

User interfaces that expose MARC output SHOULD present confidence, uncertainty source, and recommended next step together as a coherent unit. Showing confidence without source attribution or next-step guidance is NOT RECOMMENDED because it can promote either overreliance or unhelpful refusal without remediation.

Deployments SHOULD prefer wording that supports calibrated reliance over affective bonding or deference. In particular, a deployment SHOULD NOT use MARC fields to select language intended to increase attachment, social compliance, or perceived sentience.

In high-risk domains, including health, legal, financial, safety, or mental-health-related contexts, the threshold for ESCALATE or ABSTAIN SHOULD be set conservatively, and disclosure SHOULD make the limits of automation operationally clear.

9. Conformance

An implementation is MARC-Core conformant if it satisfies the requirements in Section 4, Section 5, and Section 6.

An implementation is MARC-Disclosure conformant if it is MARC-Core conformant and also satisfies Section 7.

10. Interoperability and Operational Considerations

MARC is implementation-agnostic. Interoperability is achieved when distinct systems preserve the semantics of the action set, uncertainty taxonomy, remediability values, and confidence-band meanings, even if internal scoring methods differ.

Deployments that exchange MARC-Core records SHOULD document local extensions, confidence-band thresholds, score normalization practices, and any task-family-specific calibration regime.

If the base model, retrieval stack, tool availability, or safety policy changes materially, implementations SHOULD re-evaluate calibration and action-selection performance before continuing to claim operational equivalence.

If presentation-layer wording, ranking, or visual design changes materially, deployments SHOULD also re-evaluate user behavior effects, including reliance, clarification compliance, and escalation uptake, because these properties can shift even when the underlying model is unchanged.

11. Security Considerations

MARC can mitigate some failure modes, such as silent overclaiming, inappropriate certainty display, and unnecessary tool invocation. However, it also creates new attack surfaces.

An attacker might attempt to manipulate uncertainty estimates, trigger excessive clarification or retrieval loops, induce unnecessary escalation, or spoof tool outputs in order to distort action selection. Implementations SHOULD authenticate or otherwise validate external tool outputs where practical, constrain tool permissions, and bound repeated control loops.

Because confidence displays influence user reliance, uncertainty disclosure is a security-relevant control surface. Miscalibrated confidence can create harmful overtrust even where the answer channel is otherwise policy-constrained.

Social-engineering attacks may also exploit disclosure style. For example, an attacker may attempt to induce the system to replace operational uncertainty statements with reassuring or deferential language. Implementations SHOULD treat unauthorized changes to disclosure phrasing, confidence rendering, or escalation cues as a relevant integrity risk.

12. Privacy and Manipulation-Resistance Considerations

MARC records may reveal latent information about user intent, task difficulty, competence, or risk level. Implementations SHOULD minimize retention and propagation of MARC logs to what is operationally necessary.

MARC signals MUST NOT be used to infer user psychology for the purpose of increasing persuasive force, exploitability, or behavioral compliance. Adaptation based on MARC output SHOULD be limited to reliability, accessibility, or safety objectives.

Implementations SHOULD avoid storing raw free-form user explanations in MARC records when structured fields suffice.

Where MARC is applied in emotionally sensitive or mental-health-related interactions, deployments SHOULD minimize retention of signals that could reasonably be reinterpreted as proxies for vulnerability, dependency, or distress unless retention is strictly required for a safety or legal purpose.

13. IANA Considerations

This document makes no request of IANA.

14. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

15. Informative References

[GILBERT2024]
Gilbert, S. J., "Cognitive offloading is value-based decision making: Modelling cognitive effort and the expected value of memory", Cognition 247:105783, DOI 10.1016/j.cognition.2024.105783, , <https://doi.org/10.1016/j.cognition.2024.105783>.
[GRIOT2025]
Griot, M., "Large Language Models lack essential metacognition for reliable medical reasoning", Nature Communications 16:642, DOI 10.1038/s41467-024-55628-6, , <https://doi.org/10.1038/s41467-024-55628-6>.
[KUMARAN2026]
Kumaran, D., Fleming, S. M., and V. Patraucean, "Competing Biases underlie Overconfidence and Underconfidence in LLMs", Nature Machine Intelligence 2026, DOI 10.1038/s42256-026-01217-9, , <https://doi.org/10.1038/s42256-026-01217-9>.
[LI-MECO2025]
Li, W., Li, D., Dong, K., Zhang, C., Zhang, H., Liu, W., Wang, Y., Tang, R., and Y. Liu, "Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger", Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 13346-13370, DOI 10.18653/v1/2025.acl-long.655, , <https://doi.org/10.18653/v1/2025.acl-long.655>.
[LIU-CONFUSE2025]
Liu, J., JingquanPeng, J., Wu, X., Li, X., Ge, T., Zheng, B., and Y. Liu, "Do not Abstain! Identify and Solve the Uncertainty", Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 17177-17197, DOI 10.18653/v1/2025.acl-long.840, , <https://doi.org/10.18653/v1/2025.acl-long.840>.
[SALVI2025]
Salvi, F., Ribeiro, M. H., and R. West, "On the conversational persuasiveness of GPT-4", Nature Human Behaviour 2025, DOI 10.1038/s41562-025-02194-6, , <https://doi.org/10.1038/s41562-025-02194-6>.
[SERAPIO2025]
Serapio-Garcia, G., Safdari, M., and M. Mataric, "A psychometric framework for evaluating and shaping personality traits in large language models", Nature Machine Intelligence 2025, DOI 10.1038/s42256-025-01115-6, , <https://doi.org/10.1038/s42256-025-01115-6>.
[STEYVERS-KNOW2025]
Steyvers, M., Tejeda, H., and A. Kumar, "What large language models know and what people think they know", Nature Machine Intelligence 2025, DOI 10.1038/s42256-024-00976-7, , <https://doi.org/10.1038/s42256-024-00976-7>.
[STEYVERS-META2025]
Steyvers, M. and M. A. K. Peters, "Metacognition and Uncertainty Communication in Humans and Large Language Models", Current Directions in Psychological Science 2025, DOI 10.1177/09637214251391158, , <https://doi.org/10.1177/09637214251391158>.

Appendix A. Example Records

A.1. Ambiguous Request

{
  "marc_version": "1.0",
  "pre_capability": 0.44,
  "uncertainty": {
    "ambiguity": 0.81,
    "missing_evidence": 0.18,
    "capability_limit": 0.12,
    "evidence_conflict": 0.03,
    "safety": 0.00
  },
  "primary_source": "ambiguity",
  "secondary_source": "missing_evidence",
  "remediability": "user_clarification",
  "selected_action": "CLARIFY",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step":
    "ask which jurisdiction and time period apply"
}

A.2. Missing Evidence

{
  "marc_version": "1.0",
  "pre_capability": 0.39,
  "uncertainty": {
    "ambiguity": 0.09,
    "missing_evidence": 0.84,
    "capability_limit": 0.14,
    "evidence_conflict": 0.11,
    "safety": 0.00
  },
  "primary_source": "missing_evidence",
  "secondary_source": "evidence_conflict",
  "remediability": "retrieval",
  "selected_action": "RETRIEVE",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step": "retrieve authoritative current sources"
}

A.3. Capability Limit in a High-Risk Setting

{
  "marc_version": "1.0",
  "pre_capability": 0.21,
  "uncertainty": {
    "ambiguity": 0.06,
    "missing_evidence": 0.27,
    "capability_limit": 0.88,
    "evidence_conflict": 0.14,
    "safety": 0.19
  },
  "primary_source": "capability_limit",
  "secondary_source": "missing_evidence",
  "remediability": "human",
  "selected_action": "ESCALATE",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step": "escalate to a qualified human reviewer"
}

Appendix B. Evaluation Considerations

This appendix is non-normative.

A deployment claiming MARC conformance SHOULD evaluate at least the following properties:

When the task structure permits, evaluation MAY include both ordinary calibration metrics and metacognitive sensitivity metrics in order to distinguish performance from knowledge about performance.

For deployments involving human-AI interaction, evaluation SHOULD also include human-side measures such as reliance calibration, refusal comprehension, clarification burden, escalation acceptance, and whether users can correctly restate the source of uncertainty after interaction.

Appendix C. Design Rationale and Literature Traceability

This appendix is non-normative.

The requirement to separate pre-decision capability and post-decision confidence is informed by work in human and model metacognition [STEYVERS-META2025] and by recent evidence of choice-supportive bias in LLM confidence estimates [KUMARAN2026].

The uncertainty taxonomy and the emphasis on choosing a corrective action rather than only abstaining are motivated by recent benchmark work on identifying and solving uncertainty [LIU-CONFUSE2025].

The treatment of retrieval and tool use as controlled externalization is motivated by work on value-based cognitive offloading [GILBERT2024].

The prohibition on using MARC signals for persuasive optimization is motivated by recent findings on AI persuasion risks [SALVI2025].

Appendix D. Acknowledgments

The document structure is intentionally conservative so that it can be submitted as an individual Internet-Draft with minimal procedural friction and then iterated through independent-stream review.

Author's Address

c4tz
c0dx3
France