MMNeRF: Multi-Modal and Multi-View Optimized Cross-Scene Neural Radiance Fields

We present MMNeRF, a simple yet powerful learning framework for highly photo-realistic novel view synthesis by learning Multi-modal and Multi-view features to guide neural radiance fields to a generic model. Novel view synthesis has achieved great improvement with the significant success of NeRF-series methods. However, how to make the method generic across scenes has always been a challenging task. A good idea is to introduce 2D image features as prior knowledge for adaptive modeling, yet RGB features lack geometry and 3D spatial information, which causes shape-radiance ambiguity issues and lead to blurry and low-resolution results in the synthesis images. We propose a multi-modal multi-view method to make up for the existing methods. Specifically, we introduce depth features besides RGB features into the model and effectively fuse these multi-modal features by modality-based attention. Furthermore, Our framework innovatively adopts the transformer encoder to fuse multi-view features and uses the transformer decoder to adaptively incorporate the target view with global memory. Extensive experiments are carried out on both categories-specific and category-agnostic benchmarks, and the results demonstrate that our MMNeRF achieves state-of-the-art neural rendering performance.

View this article on IEEE Xplore


Tool Wear Monitoring Based on Transfer Learning and Improved Deep Residual Network

Considering the complex structure weight of the existing tool wear state monitoring model based on deep learning, prone to over-fitting and requiring a large amount of training data, a monitoring method based on Transfer Learning and Improved Deep Residual Network is proposed. First, the data is preprocessed, one-dimensional cutting force data are transformed into two-dimensional spectrum by wavelet transform. Then, the Improved Deep Residual Network is built and the residual module structure is optimized. The Dropout layer is introduced and the global average pooling technique is used instead of the fully connected layer. Finally, the Improved Deep Residual Network is used as the pre-training network model and the tool wear state monitoring model combined with the model-based Transfer Learning method is constructed. The results show that the accuracy of the proposed monitoring method is up to 99.74%. The presented network model has the advantages of simple structure, small number of parameters, good robustness and reliability. The ideal classification effect can be achieved with fewer iterations.

View this article on IEEE Xplore


A Novel Symmetric Stacked Autoencoder for Adversarial Domain Adaptation Under Variable Speed

At present, most of the fault diagnosis methods with extensive research and good diagnostic effect are based on the premise that the sample distribution is consistent. However, in reality, the sample distribution of rotating machinery is inconsistent due to variable working conditions, and most of the fault diagnosis algorithms have poor diagnostic effects or even invalid. To dispose the above problems, a novel symmetric stacked autoencoder (NSSAE) for adversarial domain adaptation is proposed. Firstly, the symmetric stacked autoencoder network with shared weights is used as the feature extractor to extract features which can better express the original signal. Secondly, adding domain discriminator that constituting adversarial with feature extractor to enhance the ability of feature extractor to extract domain invariant features, thus confusing the domain discriminator and making it unable to correctly distinguish the features of the two domains. Finally, to assist the adversarial training, the maximum mean discrepancy (MMD) is added to the last layer of the feature extractor to align the features of the two domains in the high-dimensional space. The experimental results show that, under the condition of variable speed, the NSSAE model can extract domain invariant features to achieve the transfer between domains, and the transfer diagnosis accuracy is high and the stability is strong.

*Published in the IEEE Reliability Society Section within IEEE Access.

View this article on IEEE Xplore


Robust Stereo Visual SLAM for Dynamic Environments With Moving Object

The accuracy of localization and mapping of automated guided vehicles (AGVs) using visual simultaneous localization and mapping (SLAM) is significantly reduced in a dynamic environment compared to a static environment due to incorrect data association caused by dynamic objects. To solve this problem, a robust stereo SLAM algorithm based on dynamic region rejection is proposed. The algorithm first detects dynamic feature points from the fundamental matrix of consecutive frames and then divides the current frame into superpixels and labels its boundaries with disparity. Finally, dynamic regions are obtained from dynamic feature points and superpixel boundaries types; only the static area is used to estimate the pose to improve the localization accuracy and robustness of the algorithm. Experiments show that the proposed algorithm outperforms ORB-SLAM2 in the KITTI dataset, and the absolute trajectory error in the actual dynamic environment can be reduced by 84% compared with the conventional ORB-SLAM2, which can effectively improve the localization and mapping accuracy of AGVs in dynamic environments.

*Published in the IEEE Vehicular Technology Society Section within IEEE Access.

View this article on IEEE Xplore


Novel Multi Center and Threshold Ternary Pattern Based Method for Disease Detection Method Using Voice

Smart health is one of the most popular and important components of smart cities. It is a relatively new context-aware healthcare paradigm influenced by several fields of expertise, such as medical informatics, communications and electronics, bioengineering, ethics, to name a few. Smart health is used to improve healthcare by providing many services such as patient monitoring, early diagnosis of disease and so on. The artificial neural network (ANN), support vector machine (SVM) and deep learning models, especially the convolutional neural network (CNN), are the most commonly used machine learning approaches where they proved to be performance in most cases. Voice disorders are rapidly spreading especially with the development of medical diagnostic systems, although they are often underestimated. Smart health systems can be an easy and fast support to voice pathology detection. The identification of an algorithm that discriminates between pathological and healthy voices with more accuracy is needed to obtain a smart and precise mobile health system. The main contribution of this paper consists of proposing a multiclass-pathologic voice classification using a novel multileveled textural feature extraction with iterative feature selector. Our approach is a simple and efficient voice-based algorithm in which a multi-center and multi threshold based ternary pattern is used (MCMTTP). A more compact multileveled features are then obtained by sample-based discretization techniques and Neighborhood Component Analysis (NCA) is applied to select features iteratively. These features are finally integrated with MCMTTP to achieve an accurate voice-based features detection. Experimental results of six classifiers with three diagnostic diseases (frontal resection, cordectomy and spastic dysphonia) show that the fused features are more suitable for describing voice-based disease detection.

*Published in the IEEE Electronics Packaging Society Section within IEEE Access.

View this article on IEEE Xplore


An In-Depth Study on Open-Set Camera Model Identification


Camera model identification refers to the problem of linking a picture to the camera model used to shoot it. In this paper, as this might be an enabling factor in different forensic applications to single out possible suspects (e.g., detecting the author of child abuse or terrorist propaganda material), many accurate camera model attribution methods have been developed. One of their main drawbacks, however, are a typical closed-set assumption of the problem. This means that an investigated photograph is always assigned to one camera model within a set of known ones present during the investigation, i.e., training time. The fact that a picture can come from a completely unrelated camera model during actual testing is usually ignored. Under realistic conditions, it is not possible to assume that every picture under analysis belongs to one of the available camera models. In this paper, to deal with this issue, we present an in-depth study on the possibility of solving the camera model identification problem in open-set scenarios. Given a photograph, we aim at detecting whether it comes from one of the known camera models of interest or from an unknown one. We compare different feature extraction algorithms and classifiers especially targeting open-set recognition. We also evaluate possible open-set training protocols that can be applied along with any open-set classifier, observing that a simple alternative among the selected ones obtains the best results. Thorough testing on independent datasets show that it is possible to leverage a recently proposed convolutional neural network as feature extractor paired with a properly trained open-set classifier aiming at solving the open-set camera model attribution problem even on small-scale image patches, improving over the state-of-the-art available solutions.

View this article on IEEE Xplore

A Study on the Elimination of Thermal Reflections


Recently, thermal cameras have been used in various surveillance and monitoring systems. In particular, in camera-based surveillance systems, algorithms are being developed for detecting and recognizing objects from images acquired in dark environments. However, it is difficult to detect and recognize an object due to the thermal reflections generated in the image obtained from a thermal camera. For example, thermal reflection often occurs on a structure or the floor near an object, similar to shadows or mirror reflections. In this case, the object and the areas of thermal reflection overlap or are connected to each other and are difficult to separate. Thermal reflection also occurs on nearby walls, which can be detected as artifacts when an object is not associated with this phenomenon. In addition, the size and pixel value of the thermal reflection area vary greatly depending on the material of the area and the environmental temperature. In this case, the patterns and pixel values of the thermal reflection and the object are similar to each other and difficult to differentiate. These problems reduce the accuracy of object detection and recognition methods. In addition, no studies have been conducted on the elimination of thermal reflection of objects under different environmental conditions. Therefore, to address these challenges, we propose a method of detecting reflections in thermal images based on deep learning and their elimination via post-processing. Experiments using a self-collected database (Dongguk thermal image database (DTh-DB), Dongguk items and vehicles database (DI&V-DB)) and an open database showed that the performance of the proposed method is superior compared to that of other state-of-the-art approaches.

View this article on IEEE Xplore