Header Image

CVPR2024 Tutorial on Generalist Agent AI

Time and Venue

Date and Time:
8:30am - 12:00pm, June 18, 2024

Room Summit 446, Seattle Convention Center (Summit)


Generalist Agent AI (GAA) is a family of systems that generate effective actions in a given environment based on the understanding of multimodal sensory input. With the advent of large foundation models, numerous GAA systems have been proposed in fields ranging from basic research to applications. While these research areas are growing rapidly by integrating with the traditional technologies of each domain, they share common interests such as data collection, benchmarking, and ethical perspectives. In this tutorial, we focus on the some representative research areas of Embodied GAA, namely embodied-multimodality, robotics, gaming (VR/AR/MR), and healthcare, etc., and we aim to provide comprehensive knowledge on the common concerns discussed in these fields. As a result we expect the participants to learn the fundamentals of GAA and gain insights to further advance their research. Specific learning outcomes include:

  • GAA Overview: A deep dive into its principles and roles in contemporary applications, providing attendees with a thorough grasp of its importance and uses.
  • Methodologies: Detailed examples of how large foundation model enhance GAAs, illustrated through case studies in embodied virtual and real world, e.g., robotics, gaming, and healthcare.
  • Performance Evaluation: Guidance on the assessment of GAAs with relevant datasets, focusing on their effectiveness and generalization.
  • Ethical Considerations: A discussion on the societal impacts and ethical challenges of deploying Agent AI, highlighting responsible development practices.
  • Emerging Trends and Future Challenges: Categorize the latest developments in each domain and discuss the future directions.

Led by esteemed experts from academia and industry, we expect that the tutorial will be an interactive and enriching experience, complete with lectures, case studies, and Q&A sessions ensuring a comprehensive and engaging learning experience for all participants.

Timetable Schedule

Time Slot Talk Scheduling Talk title Tutorial Materials
08:30 - 08:40 Jianfeng Gao Opening Remarks Slides
08:40 - 09:30 Talk1: Juan Carlos Niebles Language-based AI Agents and Large Action Models (LAMs) Slides
09:30 - 09:50 Coffee Break
09:50 - 10:40 Talk2: Yong Jae Lee Generalist Multimodal Models Slides
10:40 - 11:30 Talk3: Katsushi Ikeuchi Agent Robotics: Learning-from-observation Slides
11:30 - 11:40 Naoki Wake Ending Remarks Slides