CVPR 2024 Tutorial on Generalist Agent AI

Generalist Agent AI (GAA) is a family of systems that generate effective actions in a given environment based on the understanding of multimodal sensory input. With the advent of large foundation models, numerous GAA systems have been proposed in fields ranging from basic research to applications. While these research areas are growing rapidly by integrating with the traditional technologies of each domain, they share common interests such as data collection, benchmarking, and ethical perspectives. In this tutorial, we focus on the some representative research areas of Embodied GAA, namely embodied-multimodality, robotics, gaming (VR/AR/MR), and healthcare, etc., and we aim to provide comprehensive knowledge on the common concerns discussed in these fields. As a result we expect the participants to learn the fundamentals of GAA and gain insights to further advance their research. Specific learning outcomes include:

GAA Overview: A deep dive into its principles and roles in contemporary applications, providing attendees with a thorough grasp of its importance and uses.
Methodologies: Detailed examples of how large foundation model enhance GAAs, illustrated through case studies in embodied virtual and real world, e.g., robotics, gaming, and healthcare.
Performance Evaluation: Guidance on the assessment of GAAs with relevant datasets, focusing on their effectiveness and generalization.
Ethical Considerations: A discussion on the societal impacts and ethical challenges of deploying Agent AI, highlighting responsible development practices.
Emerging Trends and Future Challenges: Categorize the latest developments in each domain and discuss the future directions.

Led by esteemed experts from academia and industry, we expect that the tutorial will be an interactive and enriching experience, complete with lectures, case studies, and Q&A sessions ensuring a comprehensive and engaging learning experience for all participants.

Time Slot	Talk Scheduling	Talk title	Tutorial Materials
08:30 - 08:40	Jianfeng Gao	Opening Remarks	Slides
08:40 - 09:30	Talk1: Juan Carlos Niebles	Language-based AI Agents and Large Action Models (LAMs)	Slides
09:30 - 09:50	Coffee Break
09:50 - 10:40	Talk2: Yong Jae Lee	Generalist Multimodal Models	Slides
10:40 - 11:30	Talk3: Katsushi Ikeuchi	Agent Robotics: Learning-from-observation	Slides
11:30 - 11:40	Naoki Wake	Ending Remarks	Slides

CVPR2024 Tutorial on Generalist Agent AI

Time and Venue

Overview

Timetable Schedule

Invited Speakers

Organizers