Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review
Detecting objects remains one of computer vision and image understanding applications’ most fundamental and challenging aspects. Significant advances in object detection have been achieved through improved object representation and the use of deep neural network models. This paper examines more closely how object detection has evolved in the era of deep learning over the past years. We present a literature review on various state-of-the-art object detection algorithms and the underlying concepts behind these methods. We classify these methods into three main groups: anchor-based, anchor-free, and transformer-based detectors. Those approaches are distinct in the way they identify objects in the image. We discuss the insights behind these algorithms and experimental analyses to compare quality metrics, speed/accuracy tradeoffs, and training methodologies. The survey compares the major convolutional neural networks for object detection. It also covers the strengths and limitations of each object detector model and draws significant conclusions. We provide simple graphical illustrations summarising the development of object detection methods under deep learning. Finally, we identify where future research will be conducted.
View this article on IEEE Xplore
DNN Partitioning for Inference Throughput Acceleration at the Edge
Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments.
View this article on IEEE Xplore
Deep Embedded Clustering Framework for Mixed Data
Deep embedded clustering (DEC) is a representative clustering algorithm that leverages deep-learning frameworks. DEC jointly learns low-dimensional feature representations and optimizes the clustering goals but only works with numerical data. However, in practice, the real-world data to be clustered includes not only numerical features but also categorical features that DEC cannot handle. In addition, if the difference between the soft assignment and target values is large, DEC applications may suffer from convergence problems. In this study, to overcome these limitations, we propose a deep embedded clustering framework that can utilize mixed data to increase the convergence stability using soft-target updates; a concept that is borrowed from an improved deep Q learning algorithm used in reinforcement learning. To evaluate the performance of the framework, we utilized various benchmark datasets composed of mixed data and empirically demonstrated that our approach outperformed existing clustering algorithms in most standard metrics. To the best of our knowledge, we state that our work achieved state-of-the-art performance among its contemporaries in this field.
View this article on IEEE Xplore
A Hybrid Model-Based Approach on Prognostics for Railway HVAC
Prognostics and health management (PHM) of systems usually depends on appropriate prior knowledge and sufficient condition monitoring (CM) data on critical components’ degradation process to appropriately estimate the remaining useful life (RUL). A failure of complex or critical systems such as heating, ventilation, and air conditioning (HVAC) systems installed in a passenger train carriage may adversely affect people or the environment. Critical systems must meet restrictive regulations and standards, and this usually results in an early replacement of components. Therefore, the CM datasets lack data on advanced stages of degradation, and this has a significant impact on developing robust diagnostics and prognostics processes; therefore, it is difficult to find PHM implemented in HVAC systems. This paper proposes a methodology for implementing a hybrid model-based approach (HyMA) to overcome the limited representativeness of the training dataset for developing a prognostic model. The proposed methodology is evaluated building an HyMA which fuses information from a physics-based model with a deep learning algorithm to implement a prognostics process for a complex and critical system. The physics-based model of the HVAC system is used to generate run-to-failure data. This model is built and validated using information and data on the real asset; the failures are modelled according to expert knowledge and an experimental test to evaluate the behaviour of the HVAC system while working, with the air filter at different levels of degradation. In addition to using the sensors located in the real system, we model virtual sensors to observe parameters related to system components’ health. The run-to-failure datasets generated are normalized and directly used as inputs to a deep convolutional neural network (CNN) for RUL estimation. The effectiveness of the proposed methodology and approach is evaluated on datasets containing the air filter’s run-to-failure data. The experimental results show remarkable accuracy in the RUL estimation, thereby suggesting the proposed HyMA and methodology offer a promising approach for PHM.
View this article on IEEE Xplore
Tool Wear Monitoring Based on Transfer Learning and Improved Deep Residual Network
Considering the complex structure weight of the existing tool wear state monitoring model based on deep learning, prone to over-fitting and requiring a large amount of training data, a monitoring method based on Transfer Learning and Improved Deep Residual Network is proposed. First, the data is preprocessed, one-dimensional cutting force data are transformed into two-dimensional spectrum by wavelet transform. Then, the Improved Deep Residual Network is built and the residual module structure is optimized. The Dropout layer is introduced and the global average pooling technique is used instead of the fully connected layer. Finally, the Improved Deep Residual Network is used as the pre-training network model and the tool wear state monitoring model combined with the model-based Transfer Learning method is constructed. The results show that the accuracy of the proposed monitoring method is up to 99.74%. The presented network model has the advantages of simple structure, small number of parameters, good robustness and reliability. The ideal classification effect can be achieved with fewer iterations.
View this article on IEEE Xplore
Video Based Mobility Monitoring of Elderly People Using Deep Learning Models
In recent years, the number of older people living alone has increased rapidly. Innovative vision systems to remotely assess people’s mobility can help healthy, active, and happy aging. In the related literature, the mobility assessment of older people is not yet widespread in clinical practice. In addition, the poor availability of data typically forces the analyses to binary classification, e.g. normal/anomalous behavior, instead of processing exhaustive medical protocols. In this paper, real videos of elderly people performing three mobility tests of a clinical protocol are automatically categorized, emulating the complex evaluation process of expert physiotherapists. Videos acquired using low-cost cameras are initially processed to obtain skeletal information. A proper data augmentation technique is then used to enlarge the dataset variability. Thus, significant features are extracted to generate a set of inputs in the form of time series. Four deep neural network architectures with feedback connections, even aided by a preliminary convolutional layer, are proposed to label the input features in discrete classes or to estimate a continuous mobility score as the result of a regression task. The best results are achieved by the proposed Conv-BiLSTM classifier, which achieves the best accuracy, ranging between 88.12% and 90%. Further comparisons with shallow learning classifiers still prove the superiority of the deep Conv-BiLSTM classifier in assessing people’s mobility, since deep networks can evaluate the quality of test executions.
View this article on IEEE Xplore
Network Representation Learning: From Traditional Feature Learning to Deep Learning
Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data features. However, it is non-trivial to generalize deep learning to graph-structured data since it is different from the regular data such as pictures having spatial information and sounds having temporal information. Recently, researchers proposed many deep learning-based methods in the area of NRL. In this survey, we investigate classical NRL from traditional feature learning method to the deep learning-based model, analyze relationships between them, and summarize the latest progress. Finally, we discuss open issues considering NRL and point out the future directions in this field.
View this article on IEEE Xplore
Text Detection and Recognition for Images of Medical Laboratory Reports With a Deep Learning Approach
The adoption of electronic health records (EHRs) is an important step in the development of modern medicine. However, complete health records are not often available during treatment because of the functional problem of the EHR system or information barriers. This paper presents a deep-learning-based approach for textual information extraction from images of medical laboratory reports, which may help physicians solve the data-sharing problem. The approach consists of two modules: text detection and recognition. In text detection, a patch-based training strategy is applied, which can achieve the recall of 99.5% in the experiments. For text recognition, a concatenation structure is designed to combine the features from both shallow and deep layers in neural networks. The experimental results demonstrate that the text recognizer in our approach can improve the accuracy of multi-lingual text recognition. The approach will be beneficial for integrating historical health records and engaging patients in their own health care.
Follow us: