Social group detection, or the identification of humans involved in reciprocal interpersonal interactions (e.g., family members, friends, and customers and merchants), is a crucial component of social intelligence needed for agents transacting in the world. The few existing benchmarks for social group detection are limited by low scene diversity and reliance on third-person camera sources (e.g., surveillance footage). Consequently, these benchmarks generally lack real-world evaluation on how groups form and evolve in diverse cultural contexts and unconstrained settings. To address this gap, we introduce EgoGroups, a first-person view dataset that captures social dynamics in cities around the world. EgoGroups spans 64 countries covering low, medium, and high-crowd settings under four weather/time-of-day conditions. We include dense human annotations for person and social groups, along with rich geographic and scene metadata. Using this dataset, we performed an extensive evaluation of state-of-the-art VLM/LLMs and supervised models on their group detection capabilities. We found several interesting findings, including VLMs and LLMs can outperform supervised baselines in a zero-shot setting, while crowd density and cultural regions clearly influence model performance.
| Model | Params | Type | Scattered | Moderate | Crowded | All AP | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | G2 | G3 | G4 | G5 | AP | G1 | G2 | G3 | G4 | G5 | AP | G1 | G2 | G3 | G4 | G5 | AP | ||||
| Cosmos-Reason2 | 8B | VLM | 57.18 | 74.42 | 76.36 | 75.35 | 80.6 | 72.78 | 28.92 | 45.28 | 59.54 | 68.02 | 68.55 | 54.06 | 12.8 | 30.12 | 50.05 | 61.53 | 73.18 | 45.53 | 51.07 |
| LLM | 60.72 | 71.03 | 72.84 | 70.49 | 75.0 | 70.02 | 32.47 | 49.61 | 65.81 | 62.54 | 76.02 | 57.29 | 20.58 | 40.56 | 55.9 | 59.69 | 65.15 | 48.38 | 53.38 | ||
| Qwen2.5 | 32B | VLM | 58.94 | 83.37 | 86.92 | 81.6 | 92.27 | 80.62 | 30.28 | 62.72 | 77.55 | 84.21 | 79.38 | 66.83 | 19.11 | 52.42 | 73.26 | 76.05 | 71.56 | 58.48 | 63.37 |
| LLM | 63.71 | 85.28 | 82.38 | 79.51 | 62.73 | 74.72 | 40.49 | 73.14 | 75.28 | 73.02 | 70.62 | 66.51 | 30.45 | 64.66 | 72.69 | 68.27 | 64.38 | 60.09 | 63.54 | ||
| Qwen2.5 | 72B | VLM | 60.53 | 85.1 | 85.11 | 79.17 | 84.54 | 78.89 | 37.18 | 66.91 | 79.14 | 78.45 | 82.76 | 68.89 | 25.89 | 59.79 | 73.79 | 77.81 | 74.18 | 62.29 | 66.00 |
| LLM | 63.97 | 85.84 | 80.24 | 75.0 | 59.55 | 72.92 | 38.73 | 73.68 | 76.4 | 71.85 | 69.28 | 65.99 | 28.02 | 65.93 | 73.18 | 68.64 | 61.92 | 59.54 | 62.93 | ||
| Qwen3 | 30B | VLM | 71.59 | 80.28 | 74.26 | 75.0 | 59.09 | 72.04 | 50.29 | 66.02 | 71.33 | 71.07 | 70.36 | 65.81 | 36.89 | 57.26 | 63.7 | 60.82 | 58.39 | 55.41 | 63.23 |
| LLM | 69.41 | 83.56 | 79.37 | 78.47 | 62.73 | 74.71 | 54.94 | 69.94 | 71.17 | 72.14 | 70.15 | 67.67 | 47.59 | 62.84 | 69.35 | 63.25 | 59.25 | 60.46 | 64.23 | ||
| Gemini-3-Pro | — | VLM | 82.67 | 80.85 | 73.92 | 76.39 | 68.79 | 76.52 | 70.37 | 75.5 | 68.24 | 67.24 | 57.07 | 67.69 | 63.09 | 69.78 | 64.16 | 54.43 | 47.9 | 59.87 | 64.05 |
| LLM | 77.66 | 81.85 | 75.33 | 75.35 | 59.55 | 73.95 | 60.29 | 74.18 | 73.23 | 70.9 | 62.12 | 68.14 | 50.95 | 69.0 | 68.63 | 63.71 | 54.88 | 61.44 | 64.85 | ||
| Model | Params | Type | AF | AN | CA | EU | GE | LA | LE | ME | NE | SA | O |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cosmos-Reason2 | 8B | VLM | 47.19 | 50.49 | 54.01 | 51.69 | 40.07 | 51.12 | 52.98 | 47.05 | 49.86 | 52.86 | 58.89 |
| LLM | 45.95 | 56.89 | 55.25 | 54.59 | 50.95 | 54.17 | 54.40 | 49.18 | 51.21 | 52.95 | 53.37 | ||
| Qwen2.5 | 32B | VLM | 61.99 | 64.63 | 63.57 | 58.82 | 60.48 | 66.55 | 64.09 | 57.98 | 71.75 | 65.75 | 50.00 |
| LLM | 61.87 | 65.01 | 62.16 | 61.69 | 62.84 | 66.89 | 63.84 | 59.75 | 69.36 | 63.28 | 61.63 | ||
| Qwen2.5 | 72B | VLM | 58.64 | 67.24 | 65.78 | 64.54 | 67.06 | 70.06 | 67.69 | 59.42 | 71.45 | 67.30 | 65.80 |
| LLM | 61.94 | 65.22 | 61.88 | 62.32 | 64.28 | 65.04 | 62.61 | 57.33 | 67.31 | 64.49 | 54.21 | ||
| Qwen3 | 30B | VLM | 56.07 | 64.76 | 58.49 | 60.26 | 65.85 | 59.92 | 63.42 | 53.94 | 56.30 | 64.20 | 63.10 |
| LLM | 62.64 | 65.61 | 64.43 | 65.95 | 66.34 | 66.73 | 63.30 | 57.70 | 67.66 | 66.31 | 55.60 | ||
| Gemini-3-Pro | — | VLM | 62.41 | 66.70 | 64.01 | 68.23 | 65.53 | 64.79 | 64.32 | 60.24 | 71.84 | 66.03 | 62.31 |
| LLM | 60.87 | 64.59 | 64.69 | 68.61 | 64.37 | 62.55 | 66.47 | 59.51 | 62.44 | 66.69 | 64.70 |
| Model | Params | Type | Group Detection (AP) | Group ID Prediction | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | G2 | G3 | G4 | G5 | AP | Precision | Recall | F1 | |||
| JLSG | — | 8.0 | 29.30 | 37.5 | 65.40 | 67.0 | 41.4 | — | — | — | |
| JRDB-Act | — | 81.40 | 64.80 | 49.10 | 63.20 | 37.20 | 59.2 | — | — | — | |
| DVT3 | — | — | — | — | — | — | — | 61.16 | 31.06 | 41.19 | |
| Cosmos-Reason2 | 8B | VLM | 44.88 | 34.7 | 30.96 | 36.83 | 55.0 | 40.47 | 19.83 | 5.83 | 9.02 |
| LLM | 48.47 | 36.19 | 29.05 | 31.71 | 56.26 | 40.34 | 24.7 | 6.93 | 10.82 | ||
| Qwen2.5 | 32B | VLM | 25.54 | 47.44 | 47.23 | 54.03 | 69.3 | 48.71 | 22.39 | 16.81 | 19.2 |
| LLM | 35.13 | 56.78 | 54.11 | 56.96 | 69.49 | 54.49 | 28.45 | 23.88 | 25.97 | ||
| 72B | VLM | 28.4 | 45.44 | 43.62 | 51.26 | 71.99 | 48.14 | 23.54 | 17.28 | 19.93 | |
| LLM | 32.95 | 55.42 | 55.34 | 60.99 | 66.86 | 54.31 | 28.98 | 25.46 | 27.11 | ||
| Qwen3 | 30B | VLM | 30.5 | 44.2 | 41.5 | 44.71 | 72.48 | 46.68 | 30.19 | 17.45 | 22.12 |
| LLM | 47.15 | 57.38 | 54.45 | 57.97 | 66.62 | 56.71 | 40.25 | 27.84 | 32.91 | ||
| 235B | VLM | 31.54 | 44.28 | 41.36 | 50.59 | 77.24 | 49.0 | 36.80 | 21.51 | 27.15 | |
| LLM | 31.0 | 51.29 | 60.4 | 69.63 | 79.95 | 58.46 | 34.03 | 25.53 | 29.17 | ||
| Gemini-3-Pro | — | VLM | 56.66 | 71.79 | 79.33 | 79.7 | 57.49 | 68.99 | 42.99 | 46.38 | 44.62 |
| LLM | 51.65 | 69.97 | 82.52 | 81.8 | 64.08 | 70.0 | 40.89 | 43.97 | 42.37 | ||
@article{Murrugarra_2026_egogroups,
author = {Murrugarra-Llerena, Jeffri and Chitale, Pranav and Liu, Zicheng and Ao, Kai and Ham, Yujin and Balakrishnan, Guha and Cascante-Bonilla, Paola},
title = {EgoGroups: A Benchmark For Detecting Social Groups of People in the Wild},
journal = {TBD},
year = {2026},
}