The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries
O. Amoros, A. Andaluz, J. Nunez, A.J. Pena. (2025). "The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries." arXiv preprint arXiv:2508.07071.
Universitat Autònoma de Barcelona / Computer Vision Center M.S. in Computer Vision, Barcelona, Sep 2023 Relevant Courses: Computer Vision, Video Analysis, 3D Vision, Visual Detection
Universitat de Barcelona M.S. in Fundamental Principles of Data Science, Barcelona, July 2022 Honors: Computer Vision
Enabling next-gen NVIDIA hardware across the open-source AI stack:
| Repository | PR | Description |
|---|---|---|
| Dao-AILab/flash-attention | #1904 | CUDA 13 + sm12x support |
| Dao-AILab/flash-attention | #2222 | Hybrid Flash Attention |
| vllm-project/flash-attention | #95 | CUDA 13 for vLLM FA |
| sgl-project/sglang | #11299 | Enable Thor/Spark/GB300 |
| sgl-project/sgl-flash-attn | #8 | SGLang Flash Attention CUDA 13 |
| pytorch/pytorch | #165048 | Fix CUDSS for Thor/Spark |
| facebookresearch/xformers | #1344 | Blackwell support |
| state-spaces/mamba | #776 | Mamba CUDA 13 |
| bitsandbytes-foundation/bitsandbytes | #1491 | Blackwell binaries |
| opencv/opencv | #27537 | Refactor Blackwell |
| kvcache-ai/Mooncake | #344 | Enable SBSA |
Robotics & Edge AI:
| Repository | PR | Description |
|---|---|---|
| NVIDIA-AI-IOT/jetson-containers | #258 | Isaac Sim, Isaac Lab, Newton, Warp, CuPy, Numba |
| NVIDIA-AI-IOT/jetson-containers | #240 | Jetson Thor GR00T + LeRobot |
| dusty-nv/jetson-containers | #1391 | SGLang, vLLM, TF, Triton Blackwell |
| NVIDIA/Isaac-GR00T | #212 | decord2 integration |
| NVIDIA-AI-IOT/jetson-ai-lab | #343 | Cosmos Reasoning 2B |
| rbonghi/jetson_stats | #718 | Fix jtop for Thor/Spark |
Own projects: decord2 (47+ stars), edge2cloud, DGX Spark Playbook
Industry impact: Contributions to enabling the Grace/Hopper (GH200) architecture helped major cloud GPU providers fully sell out their GH200 fleet capacity, demonstrating the direct business value of making new hardware platforms accessible to the developer community.
O. Amoros, A. Andaluz, J. Nunez, A.J. Pena. (2025). "The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries." arXiv preprint arXiv:2508.07071.
G. Martínez, A. Trujillo, J. Núñez, J.C.S. Jacques, A. Clapés, et al. (2025). "Enhancing clinical psychology practice through data-driven machine learning monitoring systems."
Johnny Núñez, Zenjie Li, Sergio Escalera, Kamal Nasrollahi. (2024). "Identifying Loitering Behavior with Trajectory Analysis." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 251-259.
Johnny Núñez. (2022). "Comparison of Spatio-Temporal Hand Pose Denoising Models." Universitat de Barcelona.
Cristina Palmero, German Barquero, Julio CS Jacques Junior, Albert Clapés, Johnny Núnez, David Curto, Sorina Smeureanu, Javier Selva, Zejian Zhang, David Saeteros, David Gallardo-Pujol, Georgina Guilera, David Leiva, Feng Han, Xiaoxue Feng, Jennifer He, Wei-Wei Tu, Thomas B Moeslund, Isabelle Guyon, Sergio Escalera. (2022). "Chalearn LAP challenges on self-reported personality recognition and non-verbal behavior forecasting during social dyadic interactions: Dataset, design, and results." Understanding Social Behavior in Dyadic and Small Group Interactions. 4-52.
German Barquero, Johnny Núnez, Zhen Xu, Sergio Escalera, Wei-Wei Tu, Isabelle Guyon, Cristina Palmero. (2022). "Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios: Supplementary Material."
German Barquero, Johnny Núnez, Sergio Escalera, Zhen Xu, Wei-Wei Tu, Isabelle Guyon, Cristina Palmero. (2022). "Didn’t see that coming: a survey on non-verbal social human behavior forecasting." Understanding Social Behavior in Dyadic and Small Group Interactions. 139-178.
Tutorial at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), Denver, Colorado, USA
Conference at NVIDIA GTC 2026, San José, California, USA
Conference at Mobile World Congress (MWC) 2026, Barcelona, Spain
Workshop at Humanoid Forum, Zurich, Switzerland
Demo at NeurIPS 2025, San Diego, California, USA
Conference at Smart City Expo World Congress 2025, Barcelona, Spain
Conference at ROSCon Spain 2025, Barcelona, Spain
Demo at CES 2025, Las Vegas, Nevada, USA
Conference at NVIDIA GTC 2024, San José, California, USA
Conference at Mobile World Congress (MWC) 2024, Barcelona, Spain