The Embodied AI direction Prof. Wang Liwei led in CPII focuses on natural language processing and computer vision research. His research aims to create intelligent machines that can understand the surrounding visual world, communicate in natural language, and interact with the environment. On the perception level, intelligent machines can recognize visual content and describe them in natural language. On the interaction level, intelligent machines should understand the visual scene and be able to act and finish the task by doing navigation and conducting actions. More importantly, to consider and plan for the long-term consequences of their actions, the group is also developing algorithms that can do reasoning based on the multi-modal inputs. Beyond the directions mentioned above, the group is also devoted to solving some of the most challenging problems in real scenarios, such as understanding large AI models and making them adaptively suitable for many demanding tasks.