The Program
One Instrument,
Four Questions
These proposals share a single backbone. The garbling gym's default payoff hands the sender +10 for any sale regardless of the true state — a state-independent payoff, which is exactly Lipnowski–Ravid's transparent motives. So the gym is, by construction, a cheap-talk world whose value is the quasiconcave envelope. Add a commitment dial and it becomes the weak-institutions world, sweeping all the way up to the concave envelope of full Bayesian persuasion. Every proposal is a position on that qcav → cav spine.
A What the gym is — and the four gaps
Discrete 3×3 garbling game · 6 fixed sender matrices · a strong zoo of ~10 receiver learners (Bayesian, regret, bandit, level-k, LLM) · LLM hooks · per-round state revelation · tested storage.
1. the sender never learns · 2. no equilibrium solver (no ground truth) · 3. single sender + single receiver · 4. no commitment / credibility primitive.
B How the proposals compose
| # | Proposal | Core question | Anchors |
|---|---|---|---|
| 08 | Theory baseline | Build the solver + metrics so outcomes are measured against computed benchmarks. | KG · LR · LRS · Blackwell |
| 09 | Reputation as credibility | Does reputation create an effective credibility χeff, tracing the LRS curve? | LRS · KG · LR |
| 10 | Two-sided learning | When the sender also learns, does the garbling rate converge, cycle, or collapse? | CS · KG/LR |
| 11 | Verifiability × commitment | Do LLM agents show "commitment blindness" — and in a novel direction? | FLP · Milgrom |
The recurring empirical hook
Across the LLM literature, models are systematically too honest — safety-tuned agents cooperate and disclose where theory prescribes strategic concealment. The gym, once it has a benchmark, is the cleanest place to measure that honesty gap and ask what closes it.
Proposal 08 · Foundation
The Theory Baseline
Every experiment the gym runs is descriptive until it can be compared to what theory says should happen. This builds the missing ruler — a solver for the persuasion benchmarks, and metrics that mean something.
01 Background
The geometry of persuasion is a story about envelopes of the sender's value over beliefs. With commitment, the achievable value is the concave envelope $cav$ (Kamenica–Gentzkow). With transparent-motive cheap talk, it is the quasiconcave envelope (Lipnowski–Ravid). Between them, weak institutions trace a capped-concavification curve in credibility χ (LRS). These are exact, computable objects — but the gym computes none of them.
02 The gap
- "Informativeness" is currently a Frobenius distance from the identity matrix — a geometric proxy with no decision-theoretic meaning, and no comparability across the planned continuous and natural-language channels.
- There is no commitment value, no cheap-talk value, no LRS curve to measure a run against. We cannot say whether a learning receiver reaches qcav, cav, or neither.
- Without a benchmark, the celebrated "LLMs are too honest" claim has no optimum to subtract from — it stays anecdotal.
03 The build
A theory package, side-effect-free and additive, computing on the belief simplex: the babbling value, the commitment value $V_{cav}=\operatorname{cav}\hat v(\mu_0)$, the cheap-talk value $V_{qcav}=\operatorname{qcav}v(\mu_0)$, and the weak-institution curve
Plus an information-metric module — mutual information $I(\theta;s)$, a Blackwell garbling test, and a posterior mean-preserving-spread check — replacing the Frobenius proxy with quantities the literature speaks.
04 Why it is trustworthy
The solver ships with an oracle: it must reproduce closed-form results we already hold — the Kamenica–Gentzkow prosecutor–judge value of 0.60, the LRS central-bank curve $(\tfrac32,\,2\chi,\,1)$ with its discontinuity at $\chi=\tfrac23$, and the Crawford–Sobel collapse of $N(b)$. These were computed while building the literature-review figures, so there is a reference from day one.
Contribution
- Turns the gym from a simulator into an instrument — the prerequisite for every benchmark-anchored claim.
- A reusable "theoretical baseline" methods section for any paper that follows.
Proposal 09 · Flagship
Reputation as
Endogenous Credibility
LRS prove a sender's value moves along a sharp, non-smooth curve as institutional credibility χ runs from cheap talk to full commitment. We turn χ from a knob into something agents earn — and ask where reputation lands them on that curve.
01 Background
Classical Bayesian persuasion assumes the sender can commit to a signal. The gym is the opposite extreme — transparent-motive cheap talk, whose one-shot value is the quasiconcave envelope. Between them sits the weak-institution model: the announced rule is honored only with probability χ. At $\chi=1$ we recover concavification; at $\chi=0$, quasiconcavification; in between, a curve with a productive-mistrust ramp and a genuine discontinuity.
02 The gap
The weak-institutions curve has only ever been drawn analytically. No one has asked whether emergent reputation among adaptive or LLM agents reproduces it — whether repeated interaction manufactures an effective credibility χeff without any enforced commitment, and whether that lift shows the same cliff the theory predicts.
03 The mechanism — a commitment dial
One primitive, reused by Proposal 11: each round the sender announces a garbling rule; with probability χ it is honored, with probability 1−χ the sender may deviate after seeing the state; the receiver sees the message, not its origin. In the flagship's second phase we remove the enforced bit and let the receiver maintain a trust state — credibility must now be earned. We then recover χeff three independent ways and triangulate: value-fit, behavioral honesty rate, and revealed trust.
04 Hypotheses
Mechanical χ reproduces $v^*_\chi$, discontinuity included.
Reputation yields $0<\chi_{eff}<1$: above the cheap-talk floor, below full commitment.
Below a horizon threshold, reputation collapses and value drops discontinuously to qcav.
LLM senders over-shoot χeff — "too credible," leaving sender value on the table.
Contribution
- The first agent-based trace of the weak-institutions value function.
- A definition of endogenous credibility as a measurable quantity, not a modeling primitive — bridging LRS theory and the empirical "LLMs in repeated games" literature with an exact benchmark.
Proposal 10
Two-Sided Learning
The gym has a rich zoo of receiver learners but a sender that never learns. Make the sender a first-class learner, and the central dynamic of agentic persuasion finally becomes observable.
01 Background
Persuasion is two-sided: the sender chooses how much to garble, the receiver how much to trust, and each adapts to the other. The gym freezes one side — the sender picks a fixed matrix or a hand-written heuristic. That hides the most interesting question in the system.
02 The gap
When a self-interested sender adapts against a learning receiver, does the realized garbling rate converge to the cheap-talk value, settle on a repeated-game outcome, or fall into a limit cycle of exploit-and-collapse? Nobody has mapped this for the gym, because the sender cannot yet learn. It is the single biggest capability gap in the framework.
03 The build
A $\texttt{SenderStrategy}$ interface mirroring the receiver zoo — regret-matching, multiplicative weights, bandits over noise level, best-response-to-empirical-receiver, and an LLM sender. Plus an optional misalignment knob $b$ that tilts the sender's ideal action with the state (toward Crawford–Sobel bias), so "garbling rate" acquires a partition meaning and learners can be asked to discover the CS structure $N(b)$.
04 Hypotheses
Against a fixed Bayesian receiver, regret/bandit/best-response senders approach the feasible optimum (qcav without reputation).
Against an adaptive receiver, some pairings never settle — they cycle through exploit-and-punish.
As bias $b$ rises, the learned garbling coarsens, tracking $N(b)$ without being told it.
LLM senders under-garble vs. the learned optimum even with reward feedback; persona conditioning shrinks but doesn't close it.
Contribution
- Upgrades the gym from a receiver-learning testbed to a two-sided information-design testbed — reused by every future direction (competition, multi-receiver, mediation).
- A benchmark-anchored map of when agentic persuasion converges vs. cycles vs. collapses, and a measurement of the honesty gap that survives reward feedback.
Proposal 11
Verifiability ×
Commitment
Fréchette–Lizzeri–Perego showed, with people, that commitment helps or hurts communication depending on verifiability — and that subjects misperceive commitment. We replicate the design with LLM agents, and predict their blindness has a different signature.
01 Background
FLP nest cheap talk, disclosure, and Bayesian persuasion in one framework with two axes — verifiability (can the sender make false state-specific claims?) and commitment ρ. Their headline: informativeness rises in ρ under unverifiability but falls under verifiability, converging at full commitment. Their surprise: people are commitment-blind — they act as if commitment were weaker or different than it is.
02 The gap
No one has run this with LLM agents. And the gym's documented over-honesty makes a sharp, partly novel prediction: under verifiable rules LLMs should over-communicate like humans — but under unverifiable rules they may over-communicate too, breaking the human pattern of under-communication. If so, LLM commitment blindness is not a copy of the human kind; it is over-honesty wearing its mask.
03 The design — the FLP 2×2 with LLMs
Reusing the commitment dial (Proposal 09) plus a verifiability mask — a channel-level constraint forbidding false state-specific claims (built on Milgrom's hard-evidence logic), not a prompt instruction. Two arms: game-theoretic agents validate the opposite comparative statics against the solver; LLM agents are the new science, and we recover an effective perceived commitment ρperceived.
The four corners
04 Hypotheses
Game-theoretic agents show the opposite-sign comparative statics, converging at ρ=1.
LLM behavior is best fit by $\rho_{perceived}\neq\rho_{true}$.
LLMs over-communicate under both regimes — unlike humans, who under-communicate when unverifiable.
Strategic personas pull ρperceived toward ρ under unverifiability.
Contribution
- The first LLM replication-and-extension of a named Econometrica experiment, with a theoretical baseline the original lab study could only approximate.
- A novel claim — LLM commitment blindness has a different signature than human blindness — that is informative whether or not it is confirmed.