Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review

Detecting objects remains one of computer vision and image understanding applications’ most fundamental and challenging aspects. Significant advances in object detection have been achieved through improved object representation and the use of deep neural network models. This paper examines more closely how object detection has evolved in the era of deep learning over the past years. We present a literature review on various state-of-the-art object detection algorithms and the underlying concepts behind these methods. We classify these methods into three main groups: anchor-based, anchor-free, and transformer-based detectors. Those approaches are distinct in the way they identify objects in the image. We discuss the insights behind these algorithms and experimental analyses to compare quality metrics, speed/accuracy tradeoffs, and training methodologies. The survey compares the major convolutional neural networks for object detection. It also covers the strengths and limitations of each object detector model and draws significant conclusions. We provide simple graphical illustrations summarising the development of object detection methods under deep learning. Finally, we identify where future research will be conducted.

View this article on IEEE Xplore


A Metaverse: Taxonomy, Components, Applications, and Open Challenges

Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is based on the social value of Generation Z that online and offline selves are not different. With the technological development of deep learning-based high-precision recognition models and natural generation models, Metaverse is being strengthened with various factors, from mobile-based always-on access to connectivity with reality using virtual currency. The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse. This paper divides the concepts and essential techniques necessary for realizing the Metaverse into three components (i.e., hardware, software, and contents) and three approaches (i.e., user interaction, implementation, and application) rather than marketing or hardware approach to conduct a comprehensive analysis. Furthermore, we describe essential methods based on three components and techniques to Metaverse’s representative Ready Player One, Roblox, and Facebook research in the domain of films, games, and studies. Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges.

View this article on IEEE Xplore

 

Autonomous Detection and Deterrence of Pigeons on Buildings by Drones

Pigeons may transmit diseases to humans and cause damages to buildings, monuments, and other infrastructure. Therefore, several control strategies have been developed, but they have been found to be either ineffective or harmful to animals and often depend on human operation. This study proposes a system capable of autonomously detecting and deterring pigeons on building roofs using a drone. The presence and position of pigeons were detected in real time by a neural network using images taken by a video camera located on the roof. Moreover, a drone was utilized to deter the animals. Field experiments were conducted in a real-world urban setting to assess the proposed system by comparing the number of animals and their stay durations for over five days against the 21-day-trial experiment without the drone. During the five days of experiments, the drone was automatically deployed 55 times and was significantly effective in reducing the number of birds and their stay durations without causing any harm to them. In conclusion, this study has proven the effectiveness of this system in deterring birds, and this approach can be seen as a fully autonomous alternative to the already existing methods.

View this article on IEEE Xplore

 

Clinical Micro-CT Empowered by Interior Tomography, Robotic Scanning, and Deep Learning

While micro-CT systems are instrumental in preclinical research, clinical micro-CT imaging has long been desired with cochlear implantation as a primary application. The structural details of the cochlear implant and the temporal bone require a significantly higher image resolution than that (about 0.2 mm ) provided by current medical CT scanners. In this paper, we propose a clinical micro-CT (CMCT) system design integrating conventional spiral cone-beam CT, contemporary interior tomography, deep learning techniques, and the technologies of a micro-focus X-ray source, a photon-counting detector (PCD), and robotic arms for ultrahigh-resolution localized tomography of a freely-selected volume of interest (VOI) at a minimized radiation dose level. The whole system consists of a standard CT scanner for a clinical CT exam and VOI specification, and a robotic micro-CT scanner for a local scan of high spatial and spectral resolution at minimized radiation dose. The prior information from the global scan is also fully utilized for background compensation of the local scan data for accurate and stable VOI reconstruction. Our results and analysis show that the proposed hybrid reconstruction algorithm delivers accurate high-resolution local reconstruction, and is insensitive to the misalignment of the isocenter position, initial view angle and scale mismatch in the data/image registration. These findings demonstrate the feasibility of our system design. We envision that deep learning techniques can be leveraged for optimized imaging performance. With high-resolution imaging, high dose efficiency and low system cost synergistically, our proposed CMCT system has great promise in temporal bone imaging as well as various other clinical applications.

*The video published with this article received a promotional prize for the 2020 IEEE Access Best Multimedia Award (Part 2).

View this article on IEEE Xplore