Contrastive Self-Supervised Learning for Globally Distributed Landslide Detection
The Remote Sensing (RS) field continuously grapples with the challenge of transforming satellite data into actionable information. This ongoing issue results in an ever-growing accumulation of unlabeled data, complicating interpretation efforts. The situation becomes even more challenging when satellite data must be used immediately to identify the effects of a natural hazard. Self-supervised learning (SSL) offers a promising approach for learning image representations without labeled data. Once trained, an SSL model can address various tasks with significantly reduced requirements for labeled data. Despite advancements in SSL models, particularly those using contrastive learning methods like MoCo, SimCLR, and SwAV, their potential remains largely unexplored in the context of instance segmentation and semantic segmentation of satellite imagery. This study integrates SwAV within an auto-encoder framework to detect landslides using deca-metric resolution multi-spectral images from the globally-distributed large-scale landslide4sense (L4S) 2022 benchmark dataset, employing only 1% and 10% of the labeled data. Our proposed SSL auto-encoder model features two modules: SwAV, which assigns features to prototype vectors to generate encoder codes, and ResNets, serving as the decoder for the downstream task. With just 1% of labeled data, our SSL model performs comparably to ten state-of-the-art deep learning segmentation models that utilize 100% of the labeled data in a fully supervised manner. With 10% of labeled data, our SSL model outperforms all ten fully supervised counterparts trained with 100% of the labeled data.
View this article on IEEE Xplore
StrikeNet: Deep Convolutional LSTM-Based Road Lane Reconstruction With Spatiotemporal Inference for Lane Keeping Control
This paper presents a Spatio-Temporal Road Inference for a KEeping NETwork (StrikeNet), aimed at enhancing Road Lane Reconstruction (RLR) and lateral motion control in Autonomous Vehicles (AV) using deep neural networks. Accurate road lane model coefficients are essential for an effective Lane Keeping System (LKS), but the traditional vision system often fails in situations where lane markers are absent or faint and cannot be properly recognized. To overcome this, a driving dataset was restructured, combining road information from a vision system and forward images for spatial training of RLR. Sequential spatial learning outputs were then processed with in-vehicle sensor data for temporal inference via Long Short-Term Memory (LSTM). The StrikeNet was rigorously tested in both typical and uncertain driving environments. Comprehensive statistical and visualization analyses were conducted to evaluate the performance of various RLR methods and lateral motion control strategies. Remarkably, the RLR demonstrated its capability to derive reliable road coefficients even in the absence of lane markers. Upon performance comparison with four alternative techniques, our method yields the lowest error and variance between human steering inputs and the control input. Specifically, under high and low lane quality conditions, the proposed method maximally reduced the control input error by up to 72% and 66%, respectively, and decreased the variance by 54% and 94%, respectively. The findings highlight StrikeNet’s effectiveness in bolstering the fail-operational performance, and reliability of lane-keeping or lane departure warning systems in autonomous driving, thereby enhancing control continuity and mitigating path deviation-induced traffic accidents.
View this article on IEEE Xplore
Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review
Detecting objects remains one of computer vision and image understanding applications’ most fundamental and challenging aspects. Significant advances in object detection have been achieved through improved object representation and the use of deep neural network models. This paper examines more closely how object detection has evolved in the era of deep learning over the past years. We present a literature review on various state-of-the-art object detection algorithms and the underlying concepts behind these methods. We classify these methods into three main groups: anchor-based, anchor-free, and transformer-based detectors. Those approaches are distinct in the way they identify objects in the image. We discuss the insights behind these algorithms and experimental analyses to compare quality metrics, speed/accuracy tradeoffs, and training methodologies. The survey compares the major convolutional neural networks for object detection. It also covers the strengths and limitations of each object detector model and draws significant conclusions. We provide simple graphical illustrations summarising the development of object detection methods under deep learning. Finally, we identify where future research will be conducted.
View this article on IEEE Xplore
MMNeRF: Multi-Modal and Multi-View Optimized Cross-Scene Neural Radiance Fields
We present MMNeRF, a simple yet powerful learning framework for highly photo-realistic novel view synthesis by learning Multi-modal and Multi-view features to guide neural radiance fields to a generic model. Novel view synthesis has achieved great improvement with the significant success of NeRF-series methods. However, how to make the method generic across scenes has always been a challenging task. A good idea is to introduce 2D image features as prior knowledge for adaptive modeling, yet RGB features lack geometry and 3D spatial information, which causes shape-radiance ambiguity issues and lead to blurry and low-resolution results in the synthesis images. We propose a multi-modal multi-view method to make up for the existing methods. Specifically, we introduce depth features besides RGB features into the model and effectively fuse these multi-modal features by modality-based attention. Furthermore, Our framework innovatively adopts the transformer encoder to fuse multi-view features and uses the transformer decoder to adaptively incorporate the target view with global memory. Extensive experiments are carried out on both categories-specific and category-agnostic benchmarks, and the results demonstrate that our MMNeRF achieves state-of-the-art neural rendering performance.
View this article on IEEE Xplore
Tool Wear Monitoring Based on Transfer Learning and Improved Deep Residual Network
Considering the complex structure weight of the existing tool wear state monitoring model based on deep learning, prone to over-fitting and requiring a large amount of training data, a monitoring method based on Transfer Learning and Improved Deep Residual Network is proposed. First, the data is preprocessed, one-dimensional cutting force data are transformed into two-dimensional spectrum by wavelet transform. Then, the Improved Deep Residual Network is built and the residual module structure is optimized. The Dropout layer is introduced and the global average pooling technique is used instead of the fully connected layer. Finally, the Improved Deep Residual Network is used as the pre-training network model and the tool wear state monitoring model combined with the model-based Transfer Learning method is constructed. The results show that the accuracy of the proposed monitoring method is up to 99.74%. The presented network model has the advantages of simple structure, small number of parameters, good robustness and reliability. The ideal classification effect can be achieved with fewer iterations.
View this article on IEEE Xplore
A Novel Symmetric Stacked Autoencoder for Adversarial Domain Adaptation Under Variable Speed
At present, most of the fault diagnosis methods with extensive research and good diagnostic effect are based on the premise that the sample distribution is consistent. However, in reality, the sample distribution of rotating machinery is inconsistent due to variable working conditions, and most of the fault diagnosis algorithms have poor diagnostic effects or even invalid. To dispose the above problems, a novel symmetric stacked autoencoder (NSSAE) for adversarial domain adaptation is proposed. Firstly, the symmetric stacked autoencoder network with shared weights is used as the feature extractor to extract features which can better express the original signal. Secondly, adding domain discriminator that constituting adversarial with feature extractor to enhance the ability of feature extractor to extract domain invariant features, thus confusing the domain discriminator and making it unable to correctly distinguish the features of the two domains. Finally, to assist the adversarial training, the maximum mean discrepancy (MMD) is added to the last layer of the feature extractor to align the features of the two domains in the high-dimensional space. The experimental results show that, under the condition of variable speed, the NSSAE model can extract domain invariant features to achieve the transfer between domains, and the transfer diagnosis accuracy is high and the stability is strong.
*Published in the IEEE Reliability Society Section within IEEE Access.
View this article on IEEE Xplore
Robust Stereo Visual SLAM for Dynamic Environments With Moving Object
The accuracy of localization and mapping of automated guided vehicles (AGVs) using visual simultaneous localization and mapping (SLAM) is significantly reduced in a dynamic environment compared to a static environment due to incorrect data association caused by dynamic objects. To solve this problem, a robust stereo SLAM algorithm based on dynamic region rejection is proposed. The algorithm first detects dynamic feature points from the fundamental matrix of consecutive frames and then divides the current frame into superpixels and labels its boundaries with disparity. Finally, dynamic regions are obtained from dynamic feature points and superpixel boundaries types; only the static area is used to estimate the pose to improve the localization accuracy and robustness of the algorithm. Experiments show that the proposed algorithm outperforms ORB-SLAM2 in the KITTI dataset, and the absolute trajectory error in the actual dynamic environment can be reduced by 84% compared with the conventional ORB-SLAM2, which can effectively improve the localization and mapping accuracy of AGVs in dynamic environments.
*Published in the IEEE Vehicular Technology Society Section within IEEE Access.
View this article on IEEE Xplore
Novel Multi Center and Threshold Ternary Pattern Based Method for Disease Detection Method Using Voice
Smart health is one of the most popular and important components of smart cities. It is a relatively new context-aware healthcare paradigm influenced by several fields of expertise, such as medical informatics, communications and electronics, bioengineering, ethics, to name a few. Smart health is used to improve healthcare by providing many services such as patient monitoring, early diagnosis of disease and so on. The artificial neural network (ANN), support vector machine (SVM) and deep learning models, especially the convolutional neural network (CNN), are the most commonly used machine learning approaches where they proved to be performance in most cases. Voice disorders are rapidly spreading especially with the development of medical diagnostic systems, although they are often underestimated. Smart health systems can be an easy and fast support to voice pathology detection. The identification of an algorithm that discriminates between pathological and healthy voices with more accuracy is needed to obtain a smart and precise mobile health system. The main contribution of this paper consists of proposing a multiclass-pathologic voice classification using a novel multileveled textural feature extraction with iterative feature selector. Our approach is a simple and efficient voice-based algorithm in which a multi-center and multi threshold based ternary pattern is used (MCMTTP). A more compact multileveled features are then obtained by sample-based discretization techniques and Neighborhood Component Analysis (NCA) is applied to select features iteratively. These features are finally integrated with MCMTTP to achieve an accurate voice-based features detection. Experimental results of six classifiers with three diagnostic diseases (frontal resection, cordectomy and spastic dysphonia) show that the fused features are more suitable for describing voice-based disease detection.
*Published in the IEEE Electronics Packaging Society Section within IEEE Access.
View this article on IEEE Xplore
An In-Depth Study on Open-Set Camera Model Identification
Camera model identification refers to the problem of linking a picture to the camera model used to shoot it. In this paper, as this might be an enabling factor in different forensic applications to single out possible suspects (e.g., detecting the author of child abuse or terrorist propaganda material), many accurate camera model attribution methods have been developed. One of their main drawbacks, however, are a typical closed-set assumption of the problem. This means that an investigated photograph is always assigned to one camera model within a set of known ones present during the investigation, i.e., training time. The fact that a picture can come from a completely unrelated camera model during actual testing is usually ignored. Under realistic conditions, it is not possible to assume that every picture under analysis belongs to one of the available camera models. In this paper, to deal with this issue, we present an in-depth study on the possibility of solving the camera model identification problem in open-set scenarios. Given a photograph, we aim at detecting whether it comes from one of the known camera models of interest or from an unknown one. We compare different feature extraction algorithms and classifiers especially targeting open-set recognition. We also evaluate possible open-set training protocols that can be applied along with any open-set classifier, observing that a simple alternative among the selected ones obtains the best results. Thorough testing on independent datasets show that it is possible to leverage a recently proposed convolutional neural network as feature extractor paired with a properly trained open-set classifier aiming at solving the open-set camera model attribution problem even on small-scale image patches, improving over the state-of-the-art available solutions.
View this article on IEEE Xplore
A Study on the Elimination of Thermal Reflections
Recently, thermal cameras have been used in various surveillance and monitoring systems. In particular, in camera-based surveillance systems, algorithms are being developed for detecting and recognizing objects from images acquired in dark environments. However, it is difficult to detect and recognize an object due to the thermal reflections generated in the image obtained from a thermal camera. For example, thermal reflection often occurs on a structure or the floor near an object, similar to shadows or mirror reflections. In this case, the object and the areas of thermal reflection overlap or are connected to each other and are difficult to separate. Thermal reflection also occurs on nearby walls, which can be detected as artifacts when an object is not associated with this phenomenon. In addition, the size and pixel value of the thermal reflection area vary greatly depending on the material of the area and the environmental temperature. In this case, the patterns and pixel values of the thermal reflection and the object are similar to each other and difficult to differentiate. These problems reduce the accuracy of object detection and recognition methods. In addition, no studies have been conducted on the elimination of thermal reflection of objects under different environmental conditions. Therefore, to address these challenges, we propose a method of detecting reflections in thermal images based on deep learning and their elimination via post-processing. Experiments using a self-collected database (Dongguk thermal image database (DTh-DB), Dongguk items and vehicles database (DI&V-DB)) and an open database showed that the performance of the proposed method is superior compared to that of other state-of-the-art approaches.
Follow us: