NeRF View Synthesis: Subjective Quality Assessment and Objective Metrics Evaluation

Neural radiance fields (NeRF) are a groundbreaking computer vision technology that enables the generation of high-quality, immersive visual content from multiple viewpoints. This capability has significant advantages for applications such as virtual/augmented reality, 3D modelling, and content creation for the film and entertainment industry. However, the evaluation of NeRF methods poses several challenges, including a lack of comprehensive datasets, reliable assessment methodologies, and objective quality metrics. This paper addresses the problem of NeRF view synthesis (NVS) quality assessment thoroughly, by conducting a rigorous subjective quality assessment test that considers several scene classes and recently proposed NVS methods. Additionally, the performance of a wide range of state-of-the-art conventional and learning-based full-reference 2D image and video quality assessment metrics is evaluated against the subjective scores of the subjective study. This study found that errors in camera pose estimation can result in spatial misalignments between synthesized and reference images, which need to be corrected before applying an objective quality metric. The experimental results are analyzed in depth, providing a comparative evaluation of several NVS methods and objective quality metrics, across different classes of visual scenes, including real and synthetic content for front-face and 360° camera trajectories.

View this article on IEEE Xplore


A Cascaded Multimodal Natural User Interface to Reduce Driver Distraction

Natural user interfaces (NUI) have been used to reduce driver distraction while using in-vehicle infotainment systems (IVIS), and multimodal interfaces have been applied to compensate for the shortcomings of a single modality in NUIs. These multimodal NUIs have variable effects on different types of driver distraction and on different stages of drivers’ secondary tasks. However, current studies provide a limited understanding of NUIs. The design of multimodal NUIs is typically based on evaluation of the strengths of a single modality. Furthermore, studies of multimodal NUIs are not based on equivalent comparison conditions. To address this gap, we compared five single modalities commonly used for NUIs (touch, mid-air gesture, speech, gaze, and physical buttons located in a steering wheel) during a lane change task (LCT) to provide a more holistic view of driver distraction. Our findings suggest that the best approach is a combined cascaded multimodal interface that accounts for the characteristics of a single modality. We compared several combinations of cascaded multimodalities by considering the characteristics of each modality in the sequential phase of the command input process. Our results show that the combinations speech + button, speech + touch, and gaze + button represent the best cascaded multimodal interfaces to reduce driver distraction for IVIS.

View this article on IEEE Xplore