Monopoly Benchmark

Research · Monopoly Benchmark

NYU Shanghai · Jan 2026 – Present · with Lorenzo Xiao & Echo Huang

Intro

An agentic benchmark that uses a modified Monopoly game — with Nomic-style rule-changing parliament phases — to probe how LLMs negotiate, form coalitions, and strategically rewrite the rules they play under.

Research goals

Evaluate LLM agents not on static reasoning tasks, but on long-horizon, multi-agent strategic behavior.
Surface behaviors that only appear under social pressure: persuasion, deception, coalition stability, entitlement.
Create a reproducible, replayable environment where every turn, belief, and chat message is logged.

Research questions

RQ1 — Do LLMs discover mutually beneficial trades, or default to greedy play?
RQ2 — Do stable coalitions emerge, and under what conditions do they break?
RQ3 — When agents can rewrite the rules, do proposals trend self-serving or fair?
RQ4 — How well do agents model opponents (theory of mind) to predict accept/reject?

Plan

Build the game engine plus a web UI, drop in any OpenAI-compatible model as an agent, run controlled matches against a RandomAgent baseline, and export full JSON game logs for post-hoc analysis. Focused initially on negotiation and coalition metrics; rule-manipulation dynamics come in a follow-up phase.

This is the plain-HTML mirror served to crawlers, LLMs, and curl. Humans with a JavaScript-enabled browser see the rich React/XP-themed SPA at the same URL.

All plain pages · Live site · sitemap.xml