The Program

One Instrument,
Four Questions

These proposals share a single backbone. The garbling gym's default payoff hands the sender +10 for any sale regardless of the true state — a state-independent payoff, which is exactly Lipnowski–Ravid's transparent motives. So the gym is, by construction, a cheap-talk world whose value is the quasiconcave envelope. Add a commitment dial and it becomes the weak-institutions world, sweeping all the way up to the concave envelope of full Bayesian persuasion. Every proposal is a position on that qcav → cav spine.

Map · the credibility spine & where each proposal sitsSchematic
From cheap talk (qcav) to full commitment (cav). 08 builds the ruler; 09 asks whether reputation slides the system rightward; 10 makes both sides learn; 11 turns commitment into a controlled treatment crossed with verifiability.

A What the gym is — and the four gaps

HAS TODAY

Discrete 3×3 garbling game · 6 fixed sender matrices · a strong zoo of ~10 receiver learners (Bayesian, regret, bandit, level-k, LLM) · LLM hooks · per-round state revelation · tested storage.

THE FOUR GAPS

1. the sender never learns · 2. no equilibrium solver (no ground truth) · 3. single sender + single receiver · 4. no commitment / credibility primitive.

B How the proposals compose

Dependency · 08 is the foundation the rest stand onSchematic
08 supplies the benchmarks every experiment measures against. 09 defines the commitment dial (reused by 11); 10 defines the sender-learner interface (reused by 09 and 11).
#ProposalCore questionAnchors
08Theory baselineBuild the solver + metrics so outcomes are measured against computed benchmarks.KG · LR · LRS · Blackwell
09Reputation as credibilityDoes reputation create an effective credibility χeff, tracing the LRS curve?LRS · KG · LR
10Two-sided learningWhen the sender also learns, does the garbling rate converge, cycle, or collapse?CS · KG/LR
11Verifiability × commitmentDo LLM agents show "commitment blindness" — and in a novel direction?FLP · Milgrom

The recurring empirical hook

Across the LLM literature, models are systematically too honest — safety-tuned agents cooperate and disclose where theory prescribes strategic concealment. The gym, once it has a benchmark, is the cleanest place to measure that honesty gap and ask what closes it.

Proposal 08 · Foundation

The Theory Baseline

Every experiment the gym runs is descriptive until it can be compared to what theory says should happen. This builds the missing ruler — a solver for the persuasion benchmarks, and metrics that mean something.

Kamenica–Gentzkow 2011Lipnowski–Ravid 2020Lipnowski–Ravid–Shishkin 2022Blackwell 1951

01 Background

The geometry of persuasion is a story about envelopes of the sender's value over beliefs. With commitment, the achievable value is the concave envelope $cav$ (Kamenica–Gentzkow). With transparent-motive cheap talk, it is the quasiconcave envelope (Lipnowski–Ravid). Between them, weak institutions trace a capped-concavification curve in credibility χ (LRS). These are exact, computable objects — but the gym computes none of them.

02 The gap

  • "Informativeness" is currently a Frobenius distance from the identity matrix — a geometric proxy with no decision-theoretic meaning, and no comparability across the planned continuous and natural-language channels.
  • There is no commitment value, no cheap-talk value, no LRS curve to measure a run against. We cannot say whether a learning receiver reaches qcav, cav, or neither.
  • Without a benchmark, the celebrated "LLMs are too honest" claim has no optimum to subtract from — it stays anecdotal.

03 The build

A theory package, side-effect-free and additive, computing on the belief simplex: the babbling value, the commitment value $V_{cav}=\operatorname{cav}\hat v(\mu_0)$, the cheap-talk value $V_{qcav}=\operatorname{qcav}v(\mu_0)$, and the weak-institution curve

$$v^*_\chi(\mu_0)=\max_{\beta,\gamma,k}\big[k\,\operatorname{cav}(v^{\wedge\gamma})(\beta)+(1-k)\,v^{CT}(\gamma)\big],\quad v^{\wedge\gamma}=\min(v,\,v^{CT}(\gamma)).$$

Plus an information-metric module — mutual information $I(\theta;s)$, a Blackwell garbling test, and a posterior mean-preserving-spread check — replacing the Frobenius proxy with quantities the literature speaks.

Figure 08 · the benchmark ladder the gym lacksExact
For a non-concave sender value v, the solver returns the whole ladder at the prior: babbling < cheap talk (qcav) < commitment (cav). The gap between qcav and cav is the value of commitment — the quantity proposals 09–11 are about. Envelopes computed exactly via convex hull / running maxima.

04 Why it is trustworthy

The solver ships with an oracle: it must reproduce closed-form results we already hold — the Kamenica–Gentzkow prosecutor–judge value of 0.60, the LRS central-bank curve $(\tfrac32,\,2\chi,\,1)$ with its discontinuity at $\chi=\tfrac23$, and the Crawford–Sobel collapse of $N(b)$. These were computed while building the literature-review figures, so there is a reference from day one.

Contribution

  • Turns the gym from a simulator into an instrument — the prerequisite for every benchmark-anchored claim.
  • A reusable "theoretical baseline" methods section for any paper that follows.
Artifact / methods contribution; foundation for 09–11.

Proposal 09 · Flagship

Reputation as
Endogenous Credibility

LRS prove a sender's value moves along a sharp, non-smooth curve as institutional credibility χ runs from cheap talk to full commitment. We turn χ from a knob into something agents earn — and ask where reputation lands them on that curve.

Lipnowski–Ravid–Shishkin 2022Kamenica–Gentzkow 2011Lipnowski–Ravid 2020Repeated games · LLMs

01 Background

Classical Bayesian persuasion assumes the sender can commit to a signal. The gym is the opposite extreme — transparent-motive cheap talk, whose one-shot value is the quasiconcave envelope. Between them sits the weak-institution model: the announced rule is honored only with probability χ. At $\chi=1$ we recover concavification; at $\chi=0$, quasiconcavification; in between, a curve with a productive-mistrust ramp and a genuine discontinuity.

02 The gap

The weak-institutions curve has only ever been drawn analytically. No one has asked whether emergent reputation among adaptive or LLM agents reproduces it — whether repeated interaction manufactures an effective credibility χeff without any enforced commitment, and whether that lift shows the same cliff the theory predicts.

03 The mechanism — a commitment dial

One primitive, reused by Proposal 11: each round the sender announces a garbling rule; with probability χ it is honored, with probability 1−χ the sender may deviate after seeing the state; the receiver sees the message, not its origin. In the flagship's second phase we remove the enforced bit and let the receiver maintain a trust state — credibility must now be earned. We then recover χeff three independent ways and triangulate: value-fit, behavioral honesty rate, and revealed trust.

Figure 09 · does reputation climb the LRS curve?Illustrative
Left: the LRS value curve $v^*_\chi$ (shape exact); the dot marks the χeff reputation is hypothesized to reach. Right: the anticipated round-by-round climb of realized value from the cheap-talk floor toward that target. Short horizons fall off a reputational cliff back to qcav — the repeated-game echo of the LRS discontinuity. Trajectory shapes are illustrative predictions, not data.

04 Hypotheses

H1 · validation

Mechanical χ reproduces $v^*_\chi$, discontinuity included.

H2 · partial substitution

Reputation yields $0<\chi_{eff}<1$: above the cheap-talk floor, below full commitment.

H3 · a repeated-game cliff

Below a horizon threshold, reputation collapses and value drops discontinuously to qcav.

H4 · the honesty bias

LLM senders over-shoot χeff — "too credible," leaving sender value on the table.

Contribution

  • The first agent-based trace of the weak-institutions value function.
  • A definition of endogenous credibility as a measurable quantity, not a modeling primitive — bridging LRS theory and the empirical "LLMs in repeated games" literature with an exact benchmark.
Target: economics-of-AI / computational social science. Phase-A validation supports a shorter tools note.

Proposal 10

Two-Sided Learning

The gym has a rich zoo of receiver learners but a sender that never learns. Make the sender a first-class learner, and the central dynamic of agentic persuasion finally becomes observable.

Crawford–Sobel 1982Kamenica–Gentzkow / Lipnowski–RavidFolk theorem · repeated gamesLLM collusion

01 Background

Persuasion is two-sided: the sender chooses how much to garble, the receiver how much to trust, and each adapts to the other. The gym freezes one side — the sender picks a fixed matrix or a hand-written heuristic. That hides the most interesting question in the system.

02 The gap

When a self-interested sender adapts against a learning receiver, does the realized garbling rate converge to the cheap-talk value, settle on a repeated-game outcome, or fall into a limit cycle of exploit-and-collapse? Nobody has mapped this for the gym, because the sender cannot yet learn. It is the single biggest capability gap in the framework.

03 The build

A $\texttt{SenderStrategy}$ interface mirroring the receiver zoo — regret-matching, multiplicative weights, bandits over noise level, best-response-to-empirical-receiver, and an LLM sender. Plus an optional misalignment knob $b$ that tilts the sender's ideal action with the state (toward Crawford–Sobel bias), so "garbling rate" acquires a partition meaning and learners can be asked to discover the CS structure $N(b)$.

Figure 10 · three fates of two-sided adaptationIllustrative
The garbling rate over rounds for three sender×receiver pairings. Converge: a regret/bandit sender settles to the cheap-talk-optimal garbling. Cycle: build trust, exploit it, get punished by a change-point receiver, repeat. Collapse: mutual drift to full garbling (babbling). Dynamics are anticipated archetypes the experiments will classify, not measured runs.

04 Hypotheses

H1 · convergence

Against a fixed Bayesian receiver, regret/bandit/best-response senders approach the feasible optimum (qcav without reputation).

H2 · co-adaptive cycles

Against an adaptive receiver, some pairings never settle — they cycle through exploit-and-punish.

H3 · CS recovery

As bias $b$ rises, the learned garbling coarsens, tracking $N(b)$ without being told it.

H4 · persistent honesty gap

LLM senders under-garble vs. the learned optimum even with reward feedback; persona conditioning shrinks but doesn't close it.

Contribution

  • Upgrades the gym from a receiver-learning testbed to a two-sided information-design testbed — reused by every future direction (competition, multi-receiver, mediation).
  • A benchmark-anchored map of when agentic persuasion converges vs. cycles vs. collapses, and a measurement of the honesty gap that survives reward feedback.
Target: learning-dynamics / multi-agent venue; the capability itself is an artifact.

Proposal 11

Verifiability ×
Commitment

Fréchette–Lizzeri–Perego showed, with people, that commitment helps or hurts communication depending on verifiability — and that subjects misperceive commitment. We replicate the design with LLM agents, and predict their blindness has a different signature.

Fréchette–Lizzeri–Perego 2022Milgrom 1981KG / LR endpointsLLM over-honesty

01 Background

FLP nest cheap talk, disclosure, and Bayesian persuasion in one framework with two axes — verifiability (can the sender make false state-specific claims?) and commitment ρ. Their headline: informativeness rises in ρ under unverifiability but falls under verifiability, converging at full commitment. Their surprise: people are commitment-blind — they act as if commitment were weaker or different than it is.

02 The gap

No one has run this with LLM agents. And the gym's documented over-honesty makes a sharp, partly novel prediction: under verifiable rules LLMs should over-communicate like humans — but under unverifiable rules they may over-communicate too, breaking the human pattern of under-communication. If so, LLM commitment blindness is not a copy of the human kind; it is over-honesty wearing its mask.

03 The design — the FLP 2×2 with LLMs

Reusing the commitment dial (Proposal 09) plus a verifiability mask — a channel-level constraint forbidding false state-specific claims (built on Milgrom's hard-evidence logic), not a prompt instruction. Two arms: game-theoretic agents validate the opposite comparative statics against the solver; LLM agents are the new science, and we recover an effective perceived commitment ρperceived.

Figure 11a · opposite slopes & the LLM signatureTheory exact · LLM illustrative
Solid: theory — informativeness rises in ρ when unverifiable, falls when verifiable, meeting at Bayesian persuasion. Dashed: the hypothesized LLM arm, over-communicating in both regimes (H3). The vertical gap between an LLM curve and its theory curve is commitment blindness; the horizontal offset is ρperceived − ρ.

The four corners

Figure 11b · what the framework nestsSchematic
The same primitives generate four classic models at the corners; commitment is horizontal, verifiability vertical.

04 Hypotheses

H1 · replication

Game-theoretic agents show the opposite-sign comparative statics, converging at ρ=1.

H2 · blindness

LLM behavior is best fit by $\rho_{perceived}\neq\rho_{true}$.

H3 · novel direction

LLMs over-communicate under both regimes — unlike humans, who under-communicate when unverifiable.

H4 · persona modulates

Strategic personas pull ρperceived toward ρ under unverifiability.

Contribution

  • The first LLM replication-and-extension of a named Econometrica experiment, with a theoretical baseline the original lab study could only approximate.
  • A novel claim — LLM commitment blindness has a different signature than human blindness — that is informative whether or not it is confirmed.
Target: economics-of-AI / experimental venue; reuses the FLP figures from the literature review.