Exploring the capabilities of Walk-the-Talk and its potential applications in autonomous systems, robotics, and animation.
A non attentive pedestrian
Cop monitoring an accident
A jaywalker emerging between parked vehicles
An intoxicated VRU causing nuisance
In the field of autonomous driving, a key challenge is the "reality gap": transferring knowledge gained in simulation to real-world settings. Despite various approaches to mitigate this gap, there’s a notable absence of solutions targeting agent behavior generation which are crucial for mimicking spontaneous, erratic, and realistic actions of traffic participants. Recent advancements in Generative AI have enabled the representation of human activities in semantic space and generate real human motion from textual descriptions. Despite current limitations such as modality constraints, motion sequence length, resource demands, and data specificity, there’s an opportunity to innovate and use these techniques in the intelligent vehicles domain. We propose Walk-the-Talk, a motion generator utilizing Large Language Models (LLMs) to produce reliable pedestrian motions for high-fidelity simulators like CARLA. Thus, we contribute to autonomous driving simulations by aiming to scale realistic, diverse long-tail agent motion data - currently a gap in training datasets. We employ Motion Capture (MoCap) techniques to develop the Walk-the-Talk dataset, which illustrates a broad spectrum of pedestrian behaviors in street-crossing scenarios, ranging from standard walking patterns to extreme behaviors such as drunk walking and near-crash incidents. By utilizing this new dataset within a LLM, we facilitate the creation of realistic pedestrian motion sequences, a capability previously unattainable. Additionally, our findings demonstrate that leveraging the Walk-the-Talk dataset enhances cross-domain generalization and significantly improves the Fréchet Inception Distance (FID) score by approximately 15% on the HumanML3D dataset.
@INPROCEEDINGS{10588860,
author={Ramesh, Mohan and Flohr, Fabian B.},
booktitle={2024 IEEE Intelligent Vehicles Symposium (IV)},
title={Walk-the-Talk: LLM driven pedestrian motion generation},
year={2024},
volume={},
number={},
pages={3057-3062},
keywords={Legged locomotion;Training;Pedestrians;Generative AI;Large language models;Semantics;Motion capture},
doi={10.1109/IV55156.2024.10588860}}
}