Publications
Evaluating Large Language Model Biases in Persona-Steered Generation
Andy Liu, Mona Diab, Daniel Fried
In Findings of the Association for Computational Linguistics: ACL 2024. [pdf] [code]
We study the task of persona-steered text generation, where models must generate text that reflects the distribution of views that an individual fitting a persona could have. We find models are worse at representing multifaceted personas whose dimensions are incongruous with each other, and that preference-based fine-tuning improves LLM steerability at the cost of diversity.
Computational Language Acquisition with Theory of Mind
Andy Liu, Emmy Liu, Hao Zhu, Yonatan Bisk, Graham Neubig
In The Eleventh International Conference on Learning Representations, 2023. [pdf] [code]
We equip language-learning agents with theory of mind, operationalized as an internal model of a teacher agent that is trained alongside the learner. We find that both including ToM and increasing environment difficulty lead to improved language acquisition in an image referential game setting.
Dynamic Coalition Structure Detection in Natural Language-based Interactions
Andy Liu*, Abhishek Kulkarni*, Jean-Raphaël Gaglione, Daniel Fried, Ufuk Topcu
To Appear in The 23rd International Conference on Autonomous Agents and Multi-Agent Systems, 2025.
We present a novel framework combining language models and game theory to predict coalition formation in Diplomacy by analyzing natural language negotiations between players. By evaluating both the content of agreements and players’ strategic incentives to honor them, we can effectively identify which coalitions are likely to be negotiated and upheld during gameplay.
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Wenkai Li*, Jiarui Liu*, Andy Liu, Xuhui Zhou, Mona Diab, Maarten Sap
Arxiv Preprint. [pdf]
We create a large-scale, human-grounded dialogue dataset that shows how social media users express their personality in text. We then align language models to various personality traits, finding that our methods outperform prompting on personality assessments and lead to models with similar trait-ability correlations to human studies on downstream tasks.