Georgia Gkioxari

I am an assistant professor in the Computing & Mathematical Sciences department at Caltech. Previously, I was a research scientist at Meta's FAIR team. I completed my PhD at UC Berkeley with Jitendra Malik and my undergraduate studies at NTUA, Greece, where I worked with Petros Maragos.

I am a Packard Fellow (2025), the recipient of the PAMI Young Researcher Award (2021), a Google Faculty Award (2024), the Okawa Research Award (2024) and the Amazon Research Award (2024). My teammates and I received the PAMI Mark Everingham Award (2021) for the Detectron Library Suite. I was named one of 30 influential women advancing AI in 2019 by ReWork and was nominated for the Women in AI Awards in 2020 by VentureBeat. Read more about me and my work in this Q&A.

Research

The goal of our work is to design advanced visual perception models that extend the boundaries of current visual capabilities. My group currently focuses on four directions: 2D Perception, 3D Perception, Spatial Reasoning, and Tools.

2D perception

Recognition and segmentation in images.

3D perception

3D scene reconstruction and understanding.

spatial reasoning

Agents that reason about space and time.

tools

Tools for 3D deep learning, PyTorch3D.

Highlights

3D perception

SAM 3D: 3Dfy Anything in Images

SAM 3D Team

paper project code demo

3D perception

Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D

Ziqi Ma, Hongqiao Chen, Yisong Yue, Georgia Gkioxari

paper project code

2D perception

Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

Aadarsh Sahoo, Georgia Gkioxari

paper project code

spatial reasoning

Same or Not? Enhancing Visual Perception in Vision-Language Models

Damiano Marsili, Aditya Mehta, Ryan Y. Lin, Georgia Gkioxari

paper project code

spatial reasoning

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

Damiano Marsili, Georgia Gkioxari

paper project code

3D perception

Aligning Text, Images, and 3D Structure Token-by-Token

Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari

paper project code

3D perception

Find3D: Find Any Part in 3D

                        Ziqi Ma, Yisong Yue, Georgia Gkioxari

                        ICCV 2025, Highlight ✨

paper project code

spatial reasoning

VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API

                        Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari

                        CVPR 2025

paper project code data

3D perception

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

                        Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia
                        Gkioxari

                        CVPR 2023
                    

paper project code

tools

PyTorch3D

                        Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson,
                        Georgia Gkioxari 

                        Open-source library for 3D deep learning in PyTorch
                    

paper project code

2D perception

Mask R-CNN

                        Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick

                        ICCV 2017, Marr Prize

paper

Join Us

Caltech students (undergrads & grads): If you wish to work with me, please read this information.

Prospective post-docs: Interested in computer vision, 3D, representation learning, or perception? Email me your CV and a short research statement.

Prospective PhD students: Apply directly to the CMS department and mention my name in your statement of purpose. No separate email needed.

Georgia Gkioxari

Research

2D perception

3D perception

spatial reasoning

tools

Glab Members

Highlights

SAM 3D: 3Dfy Anything in Images

Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D

Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

Same or Not? Enhancing Visual Perception in Vision-Language Models

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

Aligning Text, Images, and 3D Structure Token-by-Token

Find3D: Find Any Part in 3D

VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

PyTorch3D

Mask R-CNN

Alumni

Teaching at Caltech

Join Us