PIFU-RGBD: Single-view RGB-D Pixel-aligned Implicit Function for 3D Human Reconstruction--Institute of Semiconductors

Wang, Yingli; Zhang, Liping; Li, Weijun; Dong, Xiaoli; Li, Li; Qin, Hong Source: 2023 International Conference on High Performance Big Data and Intelligent Systems, HDIS 2023, p 93-99, 2023, 2023 International Conference on High Performance Big Data and Intelligent Systems, HDIS 2023;

Abstract:

Recent advances in IMAGE-BASED parsing of human bodies have been driven by the significant improvement in successful deep learning methods for 2D image processing. Although current methods have demonstrated outstanding global reconstruction capability, they still fail to process inherent depth ambiguity in 2D image images. In this paper, we propose PIFU-RGBD, a new pixel-aligned function representation method to reconstruct the complete and detailed 3D human from a single RGB-D image. The PIFU-RGBD method is mainly structured into two stages. The initial stage involves transforming a single RGB-D image into a single-view human point cloud, and then the single-view mesh is modeled based on the point cloud data, and the binocular view is rendered. Moving on to the second stage, the depth information and voxel alignment features of binocular view are obtained through the stereoscopic vision network and input into the implicit function estimation network. By using the Marching Cubes algorithm, a complete three-dimensional reconstruction of the human body model is obtained. It is worth noting that the RGBD images obtained by any camera can be converted into the input of unified camera parameters after processing in the first stage, which makes the depth information and voxel alignment features extracted in the second stage are camera-independent. The trained network performs depth-aware reconstruction under unified parameter settings. Compared with previous works, our proposed method can effectively improve the pose ambiguity problem of the reconstruction of human model with single view input, and significantly improve the reconstruction accuracy. Compared with the current SOTA method, which uses single-view RGB-D input to reconstruct the complete human body, the scheme proposed in this paper can reconstruct the human body model with accurate posture on the data captured by cameras with different parameters, and has the advantage of stronger generalization capability.

Institute of Semiconductors

Chinese Academy of Sciences

Appendix：