Kendro, Kelly; Maloney, Jeffrey; Jarvis, Scott

Lexical diversity in human- and LLM-generated text

2024

Creative Commons 'BY' version 4.0 license

Abstract

Despite the widespread adoption of public-facing large language models (LLMs) over the past several months, we still know little about the complexities of machine-generated language in comparison to human-generated language. To better understand how lexical complexity differs between human- and LLM-produced texts, we elicited responses from four commercially-available LLMs (ChatGPT 3.5, ChatGPT 4.0, Claude, and Bard), and compared them to writing from humans from different backgrounds (i.e., L1 and L2 English users) and education levels. We also investigated whether the LLMs demonstrated consistent style across targeted prompts, as compared to the human participants. Through an analysis of six dimensions of lexical diversity (volume, abundance, variety-repetition, evenness, disparity, dispersion), preliminary results suggest that LLM-generated text differs from human-generated with regards to lexical diversity, and texts created by LLMs demonstrate less variation than human-written text. We will discuss the implications of these differences for future research and education in applied linguistics.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

Lexical diversity in human- and LLM-generated text