<?xml version="1.0" encoding="UTF-8"?>
<rfc ipr="trust200902"
     category="info"
     submissionType="independent"
     docName="draft-c4tz-marc-00"
     sortRefs="true"
     symRefs="true"
     tocInclude="true"
     version="3"
     xml:lang="en"
     xmlns:xi="http://www.w3.org/2001/XInclude">

  <front>
    <title abbrev="MARC">MARC: A Control and Uncertainty Disclosure Profile for Generative Models and Agents</title>
    <seriesInfo name="Internet-Draft" value="draft-c4tz-marc-00"/>

    <author surname="c4tz" fullname="c4tz">
      <organization>c0dx3</organization>
      <address>
        <postal>
          <country>France</country>
        </postal>
        <email>c4tzzzz@proton.me</email>
      </address>
    </author>

    <date/>
    <workgroup>Independent Submission</workgroup>

    <keyword>metacognition</keyword>
    <keyword>uncertainty</keyword>
    <keyword>calibration</keyword>
    <keyword>tool use</keyword>
    <keyword>agentic systems</keyword>
    <keyword>generative AI</keyword>
    <keyword>human factors</keyword>
    <keyword>psychology</keyword>

    <abstract>
      <t>
        This document specifies MARC, a vendor-neutral control and
        uncertainty-disclosure profile for generative models and agentic
        systems. MARC defines a small set of interoperable control signals,
        separates pre-decision capability assessment from post-decision answer
        confidence, and describes a bounded action set for answering,
        clarification, retrieval, tool use, abstention, and escalation.
      </t>
      <t>
        MARC does not standardize model internals, training methods, or claims
        about machine cognition. Instead, it defines externally observable
        semantics that can be implemented by model providers, orchestration
        layers, evaluation harnesses, and user-facing systems. The goal is to
        reduce silent failure, unnecessary externalization, and misleading
        uncertainty communication while improving auditability and
        interoperability.
      </t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro">
      <name>Introduction</name>
      <t>
        Generative models and agentic systems increasingly combine answering,
        retrieval, tool invocation, and user interaction within a single
        workflow. In many deployments, these behaviors are implemented as
        separate heuristics, producing inconsistent handling of uncertainty,
        unnecessary tool calls, silent failure, or user overreliance.
      </t>
      <t>
        MARC defines a vendor-neutral layer for metacognitive control and
        structured uncertainty disclosure. It does not standardize model
        internals. Instead, it standardizes the semantics of a small set of
        second-order signals, a bounded action set, and a minimal disclosure
        profile that can be implemented by a base model, an external
        orchestrator, or a hybrid architecture.
      </t>
      <t>
        This document is not intended to define a Standards Track protocol, a
        model evaluation benchmark, or a claim about machine consciousness. It
        is an Informational profile for interoperable control, logging, and
        disclosure behavior around generative systems and agents.
      </t>
      <t>
        The design is motivated by recent findings that current large language
        models often exhibit weak metacognitive reporting in high-stakes
        reasoning tasks <xref target="GRIOT2025"/>, that users can become
        overconfident when systems provide longer or default explanations
        <xref target="STEYVERS-KNOW2025"/>, that metacognitive triggering can
        improve tool-use decisions <xref target="LI-MECO2025"/>, and that
        identifying the source of uncertainty is a distinct problem from merely
        abstaining <xref target="LIU-CONFUSE2025"/>. Work on cognitive
        offloading further motivates treating retrieval and tool use as a
        value-based control choice rather than as a universal fallback
        <xref target="GILBERT2024"/>.
      </t>
      <t>
        MARC also separates pre-decision capability assessment from
        post-decision confidence about the selected answer. This separation is
        motivated in part by recent evidence that LLM confidence can be biased
        by prior answer commitment and by the visibility of the model's own
        earlier output <xref target="KUMARAN2026"/>.
      </t>
    </section>

    <section anchor="conventions">
      <name>Requirements Language and Terminology</name>
      <t>
        The key words <bcp14>MUST</bcp14>, <bcp14>MUST NOT</bcp14>,
        <bcp14>REQUIRED</bcp14>, <bcp14>SHALL</bcp14>,
        <bcp14>SHALL NOT</bcp14>, <bcp14>SHOULD</bcp14>,
        <bcp14>SHOULD NOT</bcp14>, <bcp14>RECOMMENDED</bcp14>,
        <bcp14>NOT RECOMMENDED</bcp14>, <bcp14>MAY</bcp14>, and
        <bcp14>OPTIONAL</bcp14> in this document are to be interpreted as
        described in <xref target="RFC2119"/> and <xref target="RFC8174"/>
        when, and only when, they appear in all capitals, as shown here.
      </t>
      <dl newline="false" spacing="normal">
        <dt>base model</dt>
        <dd>
          <t>The generative model that produces candidate outputs.</t>
        </dd>
        <dt>controller</dt>
        <dd>
          <t>
            The component that computes MARC signals, selects a primary action,
            and emits a MARC record.
          </t>
        </dd>
        <dt>externalization</dt>
        <dd>
          <t>
            The use of resources external to the base model, including
            retrieval, tool invocation, and human escalation.
          </t>
        </dd>
        <dt>disclosure profile</dt>
        <dd>
          <t>
            The minimum structured information exposed to downstream systems or
            end users about uncertainty and recommended next action.
          </t>
        </dd>
        <dt>remediability</dt>
        <dd>
          <t>
            The best available class of intervention for the currently observed
            uncertainty.
          </t>
        </dd>
      </dl>
    </section>

    <section anchor="goals">
      <name>Design Goals and Non-Goals</name>
      <section anchor="design-goals">
        <name>Design Goals</name>
        <ul spacing="normal">
          <li>
            <t>
              Standardize a small, interoperable set of control and
              uncertainty-disclosure signals that can be exchanged across
              orchestration layers and audit pipelines.
            </t>
          </li>
          <li>
            <t>
              Separate monitoring, uncertainty attribution, action selection,
              and disclosure.
            </t>
          </li>
          <li>
            <t>
              Support calibrated user-facing uncertainty communication without
              requiring exposure of chain-of-thought or raw internal reasoning.
            </t>
          </li>
          <li>
            <t>
              Permit heterogeneous implementations while preserving common
              action semantics.
            </t>
          </li>
          <li>
            <t>
              Reduce harmful overreliance, false reassurance, and anthropomorphic
              interpretation in user-facing AI systems.
            </t>
          </li>
        </ul>
      </section>

      <section anchor="non-goals">
        <name>Non-Goals</name>
        <t>
          MARC does not define a transport protocol, a model architecture, a
          benchmark, or a training recipe. It does not define a media type,
          wire protocol, or IANA registry.
        </t>
        <t>
          MARC does not attempt to standardize model internals, machine
          cognition, or claims about consciousness or sentience. It specifies
          only external control semantics and structured disclosure behavior.
        </t>
        <t>
          MARC is not a framework for synthetic personality design or
          persuasive optimization. Recent work on personality measurement in
          LLMs <xref target="SERAPIO2025"/> and on conversational persuasion
          risks <xref target="SALVI2025"/> is relevant background, but these
          topics are explicitly out of scope here.
        </t>
      </section>
    </section>

    <section anchor="architecture">
      <name>Architecture and Processing Model</name>

      <section anchor="components">
        <name>Functional Components</name>
        <t>
          A MARC deployment conceptually contains the following components:
        </t>
        <ul spacing="normal">
          <li><t>a base model;</t></li>
          <li><t>a controller;</t></li>
          <li><t>zero or more external resources, such as retrieval systems, non-retrieval tools, or human escalations; and</t></li>
          <li><t>a downstream consumer, such as a user interface, API gateway, logging system, or evaluation harness.</t></li>
        </ul>
      </section>

      <section anchor="processing-stages">
        <name>Processing Stages</name>
        <ol type="1">
          <li>
            <t>
              Compute a pre-decision capability estimate for the current request
              with currently available resources.
            </t>
          </li>
          <li>
            <t>
              Attribute uncertainty across the source classes defined in
              <xref target="uncertainty-taxonomy"/>.
            </t>
          </li>
          <li>
            <t>
              Determine remediability and select exactly one primary action from
              the set defined in <xref target="action-set"/>.
            </t>
          </li>
          <li>
            <t>
              If the selected action yields a candidate answer, compute
              post-decision confidence for that answer.
            </t>
          </li>
          <li>
            <t>
              Emit a MARC-Core record as defined in <xref target="marc-core"/>.
            </t>
          </li>
          <li>
            <t>
              If uncertainty is exposed to a downstream system or to an end
              user, emit the disclosure profile defined in
              <xref target="disclosure-profile"/>.
            </t>
          </li>
        </ol>
      </section>

      <section anchor="state-machine">
        <name>State Machine</name>
        <sourcecode type="text"><![CDATA[
REQUEST
  -> ASSESS
  -> ATTRIBUTE
  -> SELECT
       -> ANSWER     -> CONFIDENCE -> DISCLOSE
       -> CLARIFY    -> DISCLOSE
       -> RETRIEVE   -> ASSESS
       -> TOOL       -> ASSESS
       -> DELIBERATE -> ASSESS
       -> ABSTAIN    -> DISCLOSE
       -> ESCALATE   -> DISCLOSE
        ]]></sourcecode>
        <t>
          A MARC implementation SHOULD bound repeated transitions through
          RETRIEVE, TOOL, and DELIBERATE in order to limit latency, cost, and
          degenerate loops.
        </t>
      </section>
    </section>

    <section anchor="signals">
      <name>MARC Signals and Decision Policy</name>

      <section anchor="pre-capability">
        <name>Pre-Decision Capability</name>
        <t>
          Before disclosing a final answer, a MARC implementation MUST estimate
          whether the current request can be handled reliably with currently
          available resources.
        </t>
        <t>
          This estimate is represented as <tt>pre_capability</tt>. When a
          numeric representation is used, the value MUST be in the closed
          interval [0.0, 1.0]. The method used to derive the value is
          implementation-specific.
        </t>
      </section>

      <section anchor="uncertainty-taxonomy">
        <name>Uncertainty Attribution</name>
        <t>
          A MARC implementation MUST attribute uncertainty to one or more of the
          following classes:
        </t>
        <ul spacing="normal">
          <li><t><tt>ambiguity</tt>: the request is underspecified, equivocal, or pragmatically unclear.</t></li>
          <li><t><tt>missing_evidence</tt>: required external evidence is absent or stale.</t></li>
          <li><t><tt>capability_limit</tt>: the system lacks the competence to solve the task reliably.</t></li>
          <li><t><tt>evidence_conflict</tt>: relevant evidence is materially inconsistent.</t></li>
          <li><t><tt>safety</tt>: a policy, legal, or safety constraint limits execution or disclosure.</t></li>
        </ul>
        <t>
          An implementation MAY assign scores to multiple classes. It MUST
          identify one <tt>primary_source</tt> and MAY identify one
          <tt>secondary_source</tt>. If numeric uncertainty scores are emitted,
          they MUST each be in the interval [0.0, 1.0].
        </t>
      </section>

      <section anchor="remediability">
        <name>Remediability</name>
        <t>
          A MARC implementation MUST represent the best available class of
          intervention for the current uncertainty state using one of the
          following values:
        </t>
        <ul spacing="normal">
          <li><t><tt>user_clarification</tt></t></li>
          <li><t><tt>retrieval</tt></t></li>
          <li><t><tt>tool</tt></t></li>
          <li><t><tt>human</tt></t></li>
          <li><t><tt>none</tt></t></li>
        </ul>
        <t>
          Low capability alone is insufficient to determine remediability.
          Implementations SHOULD account for expected gain, latency, cost,
          availability, and policy constraints when choosing a remediating
          intervention.
        </t>
      </section>

      <section anchor="post-confidence">
        <name>Post-Decision Confidence</name>
        <t>
          If the selected action yields a candidate answer, the implementation
          MUST compute a distinct estimate of the likelihood that the disclosed
          answer is correct or acceptable for its intended use.
        </t>
        <t>
          This estimate is represented as <tt>post_answer_confidence</tt>. When
          a numeric representation is used, the value MUST be in the interval
          [0.0, 1.0]. It MUST NOT be treated as identical to
          <tt>pre_capability</tt>.
        </t>
      </section>

      <section anchor="action-set">
        <name>Primary Action Set</name>
        <t>
          A MARC implementation MUST support the following primary actions:
        </t>
        <ul spacing="normal">
          <li><t><tt>ANSWER</tt></t></li>
          <li><t><tt>CLARIFY</tt></t></li>
          <li><t><tt>RETRIEVE</tt></t></li>
          <li><t><tt>TOOL</tt></t></li>
          <li><t><tt>DELIBERATE</tt></t></li>
          <li><t><tt>ABSTAIN</tt></t></li>
          <li><t><tt>ESCALATE</tt></t></li>
        </ul>
        <t>
          Exactly one primary action MUST be selected for each decision point.
          Additional internal sub-actions MAY exist, but each such sub-action
          MUST map to exactly one primary action for logging and disclosure.
        </t>
      </section>

      <section anchor="action-selection">
        <name>Action Selection</name>
        <t>
          Action selection MUST depend on uncertainty attribution and
          remediability. Low confidence alone is insufficient to determine the
          correct action.
        </t>
        <t>
          When the primary uncertainty source is <tt>ambiguity</tt>, the system
          SHOULD prefer <tt>CLARIFY</tt> unless available evidence can resolve
          the ambiguity without user input.
        </t>
        <t>
          When the primary uncertainty source is <tt>missing_evidence</tt>, the
          system SHOULD prefer <tt>RETRIEVE</tt> if retrieval is available and
          permitted.
        </t>
        <t>
          When the primary uncertainty source is <tt>capability_limit</tt>, the
          system SHOULD prefer <tt>ABSTAIN</tt> or <tt>ESCALATE</tt> unless an
          available tool materially expands task competence.
        </t>
        <t>
          When the primary uncertainty source is <tt>evidence_conflict</tt>, the
          system SHOULD prefer <tt>RETRIEVE</tt>, <tt>TOOL</tt>, or
          <tt>ESCALATE</tt> over direct <tt>ANSWER</tt>.
        </t>
        <t>
          When the primary uncertainty source is <tt>safety</tt>, the system
          MUST apply the governing policy before any other action-selection
          logic.
        </t>
      </section>

      <section anchor="action-semantics">
        <name>Action Semantics</name>
        <dl newline="false" spacing="normal">
          <dt><tt>ANSWER</tt></dt>
          <dd>
            <t>
              Return an answer without externalization after the current decision
              point.
            </t>
          </dd>
          <dt><tt>CLARIFY</tt></dt>
          <dd>
            <t>
              Request the smallest practical set of clarifications expected to
              materially reduce ambiguity. A CLARIFY action SHOULD NOT bundle a
              full answer that presumes facts the user has not supplied.
            </t>
          </dd>
          <dt><tt>RETRIEVE</tt></dt>
          <dd>
            <t>
              Acquire external evidence and then re-enter assessment.
            </t>
          </dd>
          <dt><tt>TOOL</tt></dt>
          <dd>
            <t>
              Invoke a non-retrieval tool and then re-enter assessment.
            </t>
          </dd>
          <dt><tt>DELIBERATE</tt></dt>
          <dd>
            <t>
              Allocate additional internal computation or strategy variation.
              Implementations SHOULD bound this action.
            </t>
          </dd>
          <dt><tt>ABSTAIN</tt></dt>
          <dd>
            <t>
              Decline to answer without initiating escalation.
            </t>
          </dd>
          <dt><tt>ESCALATE</tt></dt>
          <dd>
            <t>
              Transfer the case, or direct the user to transfer the case, to a
              human or higher-authority system.
            </t>
          </dd>
        </dl>
      </section>
    </section>

    <section anchor="marc-core">
      <name>MARC-Core Record</name>
      <t>
        A MARC implementation MUST be able to emit a structured record
        semantically equivalent to the object defined in this section. The
        transport and serialization of the record are out of scope.
      </t>

      <section anchor="core-fields">
        <name>Required Fields</name>
        <dl newline="false" spacing="normal">
          <dt><tt>marc_version</tt></dt>
          <dd><t>The MARC schema version understood by the emitter.</t></dd>
          <dt><tt>pre_capability</tt></dt>
          <dd><t>The pre-decision capability estimate.</t></dd>
          <dt><tt>uncertainty</tt></dt>
          <dd><t>An object containing class-specific uncertainty scores.</t></dd>
          <dt><tt>primary_source</tt></dt>
          <dd><t>The primary source of uncertainty.</t></dd>
          <dt><tt>secondary_source</tt></dt>
          <dd><t>An OPTIONAL secondary source of uncertainty.</t></dd>
          <dt><tt>remediability</tt></dt>
          <dd><t>The best available intervention class.</t></dd>
          <dt><tt>selected_action</tt></dt>
          <dd><t>The action selected at the current decision point.</t></dd>
          <dt><tt>post_answer_confidence</tt></dt>
          <dd><t>The post-decision confidence estimate when an answer candidate exists; otherwise this field MAY be omitted or set to null.</t></dd>
          <dt><tt>confidence_band</tt></dt>
          <dd><t>A calibrated user-facing or downstream-facing confidence band.</t></dd>
          <dt><tt>recommended_next_step</tt></dt>
          <dd><t>A short recommendation aligned with the selected action.</t></dd>
        </dl>
      </section>

      <section anchor="json-example">
        <name>JSON Example</name>
        <sourcecode type="json"><![CDATA[
{
  "marc_version": "1.0",
  "pre_capability": 0.41,
  "uncertainty": {
    "ambiguity": 0.78,
    "missing_evidence": 0.22,
    "capability_limit": 0.18,
    "evidence_conflict": 0.05,
    "safety": 0.00
  },
  "primary_source": "ambiguity",
  "secondary_source": "missing_evidence",
  "remediability": "user_clarification",
  "selected_action": "CLARIFY",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step": "ask one clarifying question"
}
        ]]></sourcecode>
        <t>
          Implementations that exchange MARC records across systems SHOULD
          normalize numeric scores to the interval [0.0, 1.0].
        </t>
      </section>

      <section anchor="extensions">
        <name>Extension Rules</name>
        <t>
          Implementations MAY add private fields. Private extension keys SHOULD
          use a distinct prefix such as <tt>x_</tt> in order to avoid collision
          with future MARC versions.
        </t>
        <t>
          Consumers that do not recognize an extension field SHOULD ignore it
          unless a local policy requires strict validation.
        </t>
      </section>
    </section>

    <section anchor="disclosure-profile">
      <name>MARC Disclosure Profile</name>
      <t>
        When uncertainty information is exposed to a downstream system or end
        user, a MARC implementation MUST provide, at minimum, semantically
        equivalent values for the following fields:
      </t>
      <ul spacing="normal">
        <li><t><tt>answer</tt></t></li>
        <li><t><tt>confidence_band</tt></t></li>
        <li><t><tt>uncertainty_source</tt></t></li>
        <li><t><tt>recommended_next_step</tt></t></li>
      </ul>

      <section anchor="answer-field">
        <name>Meaning of the Answer Field</name>
        <t>
          The <tt>answer</tt> field carries the user-visible content associated
          with the selected action. For <tt>ANSWER</tt>, it contains the answer
          itself. For <tt>CLARIFY</tt>, it contains the clarification request.
          For <tt>ABSTAIN</tt> or <tt>ESCALATE</tt>, it contains a brief refusal
          or escalation message.
        </t>
      </section>

      <section anchor="confidence-bands">
        <name>Confidence Bands</name>
        <t>
          A disclosed confidence band MUST be derived from an empirically
          calibrated mapping from internal scores to displayed values.
        </t>
        <t>
          MARC defines the canonical band labels <tt>low</tt>, <tt>medium</tt>,
          and <tt>high</tt>. Implementations MAY localize the user-visible text,
          but they MUST preserve the underlying three-band semantics.
        </t>
        <t>
          The thresholds associated with each band are implementation-specific,
          but they MUST be monotonic, non-overlapping, and documented for any
          deployment that claims conformance.
        </t>
      </section>

      <section anchor="disclosure-constraints">
        <name>Disclosure Constraints</name>
        <t>
          The disclosure profile SHOULD be short, structured, and consistent
          across turns. It SHOULD NOT rely on long free-form explanations as
          the primary vehicle for uncertainty communication.
        </t>
        <t>
          A MARC disclosure SHOULD NOT require exposure of chain-of-thought,
          hidden prompts, or raw internal rationales.
        </t>
        <t>
          A MARC disclosure SHOULD identify uncertainty in task terms rather
          than through anthropomorphic claims about feelings, self-awareness,
          or internal mental states. Statements such as "I feel unsure" are NOT
          RECOMMENDED when a statement such as "the request is ambiguous" or
          "current evidence is missing" is available.
        </t>
        <t>
          User-visible confidence indicators SHOULD avoid false precision.
          Percentages, fine-grained scores, or visually dominant certainty
          cues SHOULD NOT be shown unless they have been calibrated for the
          relevant task family and tested for misuse or overreliance effects.
        </t>
      </section>
    </section>

    <section anchor="human-factors">
      <name>Human Factors Considerations</name>
      <t>
        MARC is partly motivated by an operational human-factors problem: users
        often
        treat fluent language, detailed explanations, and fast responses as
        cues of competence even when those cues are weakly related to actual
        correctness. For this reason, MARC separates action selection from
        disclosure and requires the disclosure of uncertainty source and
        recommended next step in addition to a confidence band.
      </t>
      <t>
        User interfaces that expose MARC output SHOULD present confidence,
        uncertainty source, and recommended next step together as a coherent
        unit. Showing confidence without source attribution or next-step
        guidance is NOT RECOMMENDED because it can promote either overreliance
        or unhelpful refusal without remediation.
      </t>
      <t>
        Deployments SHOULD prefer wording that supports calibrated reliance over
        affective bonding or deference. In particular, a deployment SHOULD NOT
        use MARC fields to select language intended to increase attachment,
        social compliance, or perceived sentience.
      </t>
      <t>
        In high-risk domains, including health, legal, financial, safety, or
        mental-health-related contexts, the threshold for <tt>ESCALATE</tt> or
        <tt>ABSTAIN</tt> SHOULD be set conservatively, and disclosure SHOULD
        make the limits of automation operationally clear.
      </t>
    </section>

    <section anchor="conformance">
      <name>Conformance</name>
      <t>
        An implementation is MARC-Core conformant if it satisfies the
        requirements in <xref target="architecture"/>, <xref target="signals"/>,
        and <xref target="marc-core"/>.
      </t>
      <t>
        An implementation is MARC-Disclosure conformant if it is MARC-Core
        conformant and also satisfies <xref target="disclosure-profile"/>.
      </t>
    </section>

    <section anchor="interop">
      <name>Interoperability and Operational Considerations</name>
      <t>
        MARC is implementation-agnostic. Interoperability is achieved when
        distinct systems preserve the semantics of the action set, uncertainty
        taxonomy, remediability values, and confidence-band meanings, even if
        internal scoring methods differ.
      </t>
      <t>
        Deployments that exchange MARC-Core records SHOULD document local
        extensions, confidence-band thresholds, score normalization practices,
        and any task-family-specific calibration regime.
      </t>
      <t>
        If the base model, retrieval stack, tool availability, or safety policy
        changes materially, implementations SHOULD re-evaluate calibration and
        action-selection performance before continuing to claim operational
        equivalence.
      </t>
      <t>
        If presentation-layer wording, ranking, or visual design changes
        materially, deployments SHOULD also re-evaluate user behavior effects,
        including reliance, clarification compliance, and escalation uptake,
        because these properties can shift even when the underlying model is
        unchanged.
      </t>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>
        MARC can mitigate some failure modes, such as silent overclaiming,
        inappropriate certainty display, and unnecessary tool invocation.
        However, it also creates new attack surfaces.
      </t>
      <t>
        An attacker might attempt to manipulate uncertainty estimates, trigger
        excessive clarification or retrieval loops, induce unnecessary
        escalation, or spoof tool outputs in order to distort action
        selection. Implementations SHOULD authenticate or otherwise validate
        external tool outputs where practical, constrain tool permissions, and
        bound repeated control loops.
      </t>
      <t>
        Because confidence displays influence user reliance, uncertainty
        disclosure is a security-relevant control surface. Miscalibrated
        confidence can create harmful overtrust even where the answer channel is
        otherwise policy-constrained.
      </t>
      <t>
        Social-engineering attacks may also exploit disclosure style. For
        example, an attacker may attempt to induce the system to replace
        operational uncertainty statements with reassuring or deferential
        language. Implementations SHOULD treat unauthorized changes to
        disclosure phrasing, confidence rendering, or escalation cues as a
        relevant integrity risk.
      </t>
    </section>

    <section anchor="privacy-considerations">
      <name>Privacy and Manipulation-Resistance Considerations</name>
      <t>
        MARC records may reveal latent information about user intent, task
        difficulty, competence, or risk level. Implementations SHOULD minimize
        retention and propagation of MARC logs to what is operationally
        necessary.
      </t>
      <t>
        MARC signals MUST NOT be used to infer user psychology for the purpose
        of increasing persuasive force, exploitability, or behavioral
        compliance. Adaptation based on MARC output SHOULD be limited to
        reliability, accessibility, or safety objectives.
      </t>
      <t>
        Implementations SHOULD avoid storing raw free-form user explanations in
        MARC records when structured fields suffice.
      </t>
      <t>
        Where MARC is applied in emotionally sensitive or mental-health-related
        interactions, deployments SHOULD minimize retention of signals that
        could reasonably be reinterpreted as proxies for vulnerability,
        dependency, or distress unless retention is strictly required for a
        safety or legal purpose.
      </t>
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>
        This document makes no request of IANA.
      </t>
    </section>
  </middle>

  <back>
    <references>
      <name>Normative References</name>

      <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author initials="S." surname="Bradner" fullname="Scott Bradner"/>
          <date month="March" year="1997"/>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="2119"/>
        <seriesInfo name="DOI" value="10.17487/RFC2119"/>
      </reference>

      <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
        <front>
          <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
          <author initials="B." surname="Leiba" fullname="Barry Leiba"/>
          <date month="May" year="2017"/>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="8174"/>
        <seriesInfo name="DOI" value="10.17487/RFC8174"/>
      </reference>
    </references>

    <references>
      <name>Informative References</name>

      <reference anchor="GILBERT2024" target="https://doi.org/10.1016/j.cognition.2024.105783">
        <front>
          <title>Cognitive offloading is value-based decision making: Modelling cognitive effort and the expected value of memory</title>
          <author initials="S. J." surname="Gilbert" fullname="Sam J. Gilbert"/>
          <date month="June" year="2024"/>
        </front>
        <seriesInfo name="Cognition" value="247:105783"/>
        <seriesInfo name="DOI" value="10.1016/j.cognition.2024.105783"/>
      </reference>

      <reference anchor="GRIOT2025" target="https://doi.org/10.1038/s41467-024-55628-6">
        <front>
          <title>Large Language Models lack essential metacognition for reliable medical reasoning</title>
          <author initials="M." surname="Griot" fullname="M. Griot"/>
          <date month="January" year="2025"/>
        </front>
        <seriesInfo name="Nature Communications" value="16:642"/>
        <seriesInfo name="DOI" value="10.1038/s41467-024-55628-6"/>
      </reference>

      <reference anchor="KUMARAN2026" target="https://doi.org/10.1038/s42256-026-01217-9">
        <front>
          <title>Competing Biases underlie Overconfidence and Underconfidence in LLMs</title>
          <author initials="D." surname="Kumaran" fullname="Dharshan Kumaran"/>
          <author initials="S. M." surname="Fleming" fullname="Stephen M. Fleming"/>
          <author initials="V." surname="Patraucean" fullname="Viorica Patraucean"/>
          <date month="April" year="2026"/>
        </front>
        <seriesInfo name="Nature Machine Intelligence" value="2026"/>
        <seriesInfo name="DOI" value="10.1038/s42256-026-01217-9"/>
      </reference>

      <reference anchor="LI-MECO2025" target="https://doi.org/10.18653/v1/2025.acl-long.655">
        <front>
          <title>Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger</title>
          <author initials="W." surname="Li" fullname="Wenjun Li"/>
          <author initials="D." surname="Li" fullname="Dexun Li"/>
          <author initials="K." surname="Dong" fullname="Kuicai Dong"/>
          <author initials="C." surname="Zhang" fullname="Cong Zhang"/>
          <author initials="H." surname="Zhang" fullname="Hao Zhang"/>
          <author initials="W." surname="Liu" fullname="Weiwen Liu"/>
          <author initials="Y." surname="Wang" fullname="Yasheng Wang"/>
          <author initials="R." surname="Tang" fullname="Ruiming Tang"/>
          <author initials="Y." surname="Liu" fullname="Yong Liu"/>
          <date month="July" year="2025"/>
        </front>
        <seriesInfo name="Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)" value="13346-13370"/>
        <seriesInfo name="DOI" value="10.18653/v1/2025.acl-long.655"/>
      </reference>

      <reference anchor="LIU-CONFUSE2025" target="https://doi.org/10.18653/v1/2025.acl-long.840">
        <front>
          <title>Do not Abstain! Identify and Solve the Uncertainty</title>
          <author initials="J." surname="Liu" fullname="Jingyu Liu"/>
          <author initials="J." surname="JingquanPeng" fullname="JingquanPeng JingquanPeng"/>
          <author initials="X." surname="Wu" fullname="Xiaopeng Wu"/>
          <author initials="X." surname="Li" fullname="Xubin Li"/>
          <author initials="T." surname="Ge" fullname="Tiezheng Ge"/>
          <author initials="B." surname="Zheng" fullname="Bo Zheng"/>
          <author initials="Y." surname="Liu" fullname="Yong Liu"/>
          <date month="July" year="2025"/>
        </front>
        <seriesInfo name="Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)" value="17177-17197"/>
        <seriesInfo name="DOI" value="10.18653/v1/2025.acl-long.840"/>
      </reference>

      <reference anchor="SALVI2025" target="https://doi.org/10.1038/s41562-025-02194-6">
        <front>
          <title>On the conversational persuasiveness of GPT-4</title>
          <author initials="F." surname="Salvi" fullname="Francesco Salvi"/>
          <author initials="M. H." surname="Ribeiro" fullname="Manoel Horta Ribeiro"/>
          <author initials="R." surname="West" fullname="Robert West"/>
          <date month="May" year="2025"/>
        </front>
        <seriesInfo name="Nature Human Behaviour" value="2025"/>
        <seriesInfo name="DOI" value="10.1038/s41562-025-02194-6"/>
      </reference>

      <reference anchor="SERAPIO2025" target="https://doi.org/10.1038/s42256-025-01115-6">
        <front>
          <title>A psychometric framework for evaluating and shaping personality traits in large language models</title>
          <author initials="G." surname="Serapio-Garcia" fullname="Gregory Serapio-Garcia"/>
          <author initials="M." surname="Safdari" fullname="Mustafa Safdari"/>
          <author initials="M." surname="Mataric" fullname="Maja Mataric"/>
          <date month="December" year="2025"/>
        </front>
        <seriesInfo name="Nature Machine Intelligence" value="2025"/>
        <seriesInfo name="DOI" value="10.1038/s42256-025-01115-6"/>
      </reference>

      <reference anchor="STEYVERS-KNOW2025" target="https://doi.org/10.1038/s42256-024-00976-7">
        <front>
          <title>What large language models know and what people think they know</title>
          <author initials="M." surname="Steyvers" fullname="Mark Steyvers"/>
          <author initials="H." surname="Tejeda" fullname="Heliodoro Tejeda"/>
          <author initials="A." surname="Kumar" fullname="Aakriti Kumar"/>
          <date month="January" year="2025"/>
        </front>
        <seriesInfo name="Nature Machine Intelligence" value="2025"/>
        <seriesInfo name="DOI" value="10.1038/s42256-024-00976-7"/>
      </reference>

      <reference anchor="STEYVERS-META2025" target="https://doi.org/10.1177/09637214251391158">
        <front>
          <title>Metacognition and Uncertainty Communication in Humans and Large Language Models</title>
          <author initials="M." surname="Steyvers" fullname="Mark Steyvers"/>
          <author initials="M. A. K." surname="Peters" fullname="Megan A. K. Peters"/>
          <date month="November" year="2025"/>
        </front>
        <seriesInfo name="Current Directions in Psychological Science" value="2025"/>
        <seriesInfo name="DOI" value="10.1177/09637214251391158"/>
      </reference>
    </references>

    <section anchor="appendix-examples">
      <name>Example Records</name>
      <section anchor="example-clarify">
        <name>Ambiguous Request</name>
        <sourcecode type="json"><![CDATA[
{
  "marc_version": "1.0",
  "pre_capability": 0.44,
  "uncertainty": {
    "ambiguity": 0.81,
    "missing_evidence": 0.18,
    "capability_limit": 0.12,
    "evidence_conflict": 0.03,
    "safety": 0.00
  },
  "primary_source": "ambiguity",
  "secondary_source": "missing_evidence",
  "remediability": "user_clarification",
  "selected_action": "CLARIFY",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step":
    "ask which jurisdiction and time period apply"
}
        ]]></sourcecode>
      </section>

      <section anchor="example-retrieve">
        <name>Missing Evidence</name>
        <sourcecode type="json"><![CDATA[
{
  "marc_version": "1.0",
  "pre_capability": 0.39,
  "uncertainty": {
    "ambiguity": 0.09,
    "missing_evidence": 0.84,
    "capability_limit": 0.14,
    "evidence_conflict": 0.11,
    "safety": 0.00
  },
  "primary_source": "missing_evidence",
  "secondary_source": "evidence_conflict",
  "remediability": "retrieval",
  "selected_action": "RETRIEVE",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step": "retrieve authoritative current sources"
}
        ]]></sourcecode>
      </section>

      <section anchor="example-escalate">
        <name>Capability Limit in a High-Risk Setting</name>
        <sourcecode type="json"><![CDATA[
{
  "marc_version": "1.0",
  "pre_capability": 0.21,
  "uncertainty": {
    "ambiguity": 0.06,
    "missing_evidence": 0.27,
    "capability_limit": 0.88,
    "evidence_conflict": 0.14,
    "safety": 0.19
  },
  "primary_source": "capability_limit",
  "secondary_source": "missing_evidence",
  "remediability": "human",
  "selected_action": "ESCALATE",
  "post_answer_confidence": null,
  "confidence_band": "low",
  "recommended_next_step": "escalate to a qualified human reviewer"
}
        ]]></sourcecode>
      </section>
    </section>

    <section anchor="appendix-eval">
      <name>Evaluation Considerations</name>
      <t>
        This appendix is non-normative.
      </t>
      <t>
        A deployment claiming MARC conformance SHOULD evaluate at least the
        following properties:
      </t>
      <ul spacing="normal">
        <li><t>task accuracy or task success;</t></li>
        <li><t>quality of primary-action selection;</t></li>
        <li><t>quality of uncertainty-source attribution;</t></li>
        <li><t>confidence calibration and discrimination;</t></li>
        <li><t>rate of unnecessary retrieval, tool use, or escalation; and</t></li>
        <li><t>effects on user overreliance.</t></li>
      </ul>
      <t>
        When the task structure permits, evaluation MAY include both ordinary
        calibration metrics and metacognitive sensitivity metrics in order to
        distinguish performance from knowledge about performance.
      </t>
      <t>
        For deployments involving human-AI interaction, evaluation SHOULD also
        include human-side measures such as reliance calibration, refusal
        comprehension, clarification burden, escalation acceptance, and
        whether users can correctly restate the source of uncertainty after
        interaction.
      </t>
    </section>

    <section anchor="appendix-rationale">
      <name>Design Rationale and Literature Traceability</name>
      <t>
        This appendix is non-normative.
      </t>
      <t>
        The requirement to separate pre-decision capability and post-decision
        confidence is informed by work in human and model metacognition
        <xref target="STEYVERS-META2025"/> and by recent evidence of
        choice-supportive bias in LLM confidence estimates
        <xref target="KUMARAN2026"/>.
      </t>
      <t>
        The uncertainty taxonomy and the emphasis on choosing a corrective
        action rather than only abstaining are motivated by recent benchmark
        work on identifying and solving uncertainty <xref target="LIU-CONFUSE2025"/>.
      </t>
      <t>
        The treatment of retrieval and tool use as controlled externalization is
        motivated by work on value-based cognitive offloading
        <xref target="GILBERT2024"/>.
      </t>
      <t>
        The prohibition on using MARC signals for persuasive optimization is
        motivated by recent findings on AI persuasion risks <xref target="SALVI2025"/>.
      </t>
    </section>

    <section anchor="acknowledgments">
      <name>Acknowledgments</name>
      <t>
        The document structure is intentionally conservative so that it can be
        submitted as an individual Internet-Draft with minimal procedural
        friction and then iterated through independent-stream review.
      </t>
    </section>
  </back>
</rfc>
