Comparison of Spatio-Temporal Hand Pose Denoising Models

Published in Universitat de Barcelona, 2022

In computer vision, human pose estimation determines the position of a person or object from an image or video using keypoints (landmarks or joints) in image space, connected to resemble the human body’s structure. This technique is critical in applications like the metaverse, video games, and movies, where creators benefit from accurate human pose identification and orientation relative to the camera. While existing methods work well for individual images, video sequences present greater challenges due to the temporal context of human movement, leading to noise in pose estimation. This paper discusses the need for pose refinement or denoising methods to address issues such as noisy keypoints and incomplete skeletons in human pose estimation from video data.

Recommended citation: Johnny Núnez. (2022). "Comparison of Spatio-Temporal Hand Pose Denoising Models." Universitat de Barcelona.
Download Paper | Download Slides