Deep Embedded Clustering Framework for Mixed Data

Deep embedded clustering (DEC) is a representative clustering algorithm that leverages deep-learning frameworks. DEC jointly learns low-dimensional feature representations and optimizes the clustering goals but only works with numerical data. However, in practice, the real-world data to be clustered includes not only numerical features but also categorical features that DEC cannot handle. In addition, if the difference between the soft assignment and target values is large, DEC applications may suffer from convergence problems. In this study, to overcome these limitations, we propose a deep embedded clustering framework that can utilize mixed data to increase the convergence stability using soft-target updates; a concept that is borrowed from an improved deep Q learning algorithm used in reinforcement learning. To evaluate the performance of the framework, we utilized various benchmark datasets composed of mixed data and empirically demonstrated that our approach outperformed existing clustering algorithms in most standard metrics. To the best of our knowledge, we state that our work achieved state-of-the-art performance among its contemporaries in this field.

View this article on IEEE Xplore

 

Software Fault-Proneness Analysis based on Composite Developer-Module Networks

Existing software fault-proneness analysis and prediction models can be categorized into software metrics and visualized approaches. However, the studies of the software metrics solely rely on the quantified data, while the latter fails to reflect the human aspect, which is proven to be a main cause of many failures in various domains. In this paper, we proposed a new analysis model with an improved software network called Composite Developer-Module Network. The network is composed of the linkage of both developers to software modules and software modules to modules to reflect the characteristics and interaction between developers. After the networks of the research objects are built, several different sub-graphs in the networks are derived from analyzing the structures of the sub-graphs that are more fault-prone and further determine whether the software development is in a bad structure, thus predicting the fault-proneness. Our research shows that the different sub-structures are not only a factor in fault-proneness, but also that the complexity of the sub-structure can affect the production of bugs.

*Published in the IEEE Reliability Society Section within IEEE Access.

View this article on IEEE Xplore

 

Breast Cancer Histopathology Image Super-Resolution Using Wide-Attention GAN With Improved Wasserstein Gradient Penalty and Perceptual Loss

In the realm of image processing, enhancing the quality of the images is known as a superresolution problem (SR). Among SR methods, a super-resolution generative adversarial network, or SRGAN, has been introduced to generate SR images from low-resolution images. As it is of the utmost importance to keep the size and the shape of the images, while enlarging the medical images, we propose a novel super-resolution model with a generative adversarial network to generate SR images with finer details and higher quality to encourage less blurring. By widening residual blocks and using a self-attention layer, our model becomes robust and generalizable as it is able to extract the most important part of the images before up-sampling. We named our proposed model as wide-attention SRGAN (WA-SRGAN). Moreover, we have applied improved Wasserstein with a Gradient penalty to stabilize the model while training. To train our model, we have applied images from Camylon 16 database and enlarged them by 2×, 4×, 8×, and 16× upscale factors with the ground truth of the size of 256 × 256 × 3. Furthermore, two normalization methods, including batch normalization, and weight normalization have been applied and we observed that weight normalization is an enabling factor to improve metric performance in terms of SSIM. Moreover, several evaluation metrics, such as PSNR, MSE, SSIM, MS-SSIM, and QILV have been applied for having a comprehensive objective comparison with other methods, including SRGAN, A-SRGAN, and bicubial. Also, we performed the job of classification by using a deep learning model called ResNeXt-101 (32 × 8d) for super-resolution, high-resolution, and low-resolution images and compared the outcomes in terms of accuracy score. Finally, the results on breast cancer histopathology images show the superiority of our model by using weight normalization and a batch size of one in terms of restoration of the color and the texture details.

View this article on IEEE Xplore

Improving Predictability of User-Affecting Metrics to Support Anomaly Detection in Cloud Services

Anomaly detection systems aim to detect and report attacks or unexpected behavior in networked systems. Previous work has shown that anomalies have an impact on system performance, and that performance signatures can be effectively used for implementing an IDS. In this paper, we present an analytical and an experimental study on the trade-off between anomaly detection based on performance signatures and system scalability. The proposed approach combines analytical modeling and load testing to find optimal configurations for the signature-based IDS. We apply a heavy-tail bi-modal modeling approach, where “long” jobs represent large resource consuming transactions, e.g., generated by DDoS attacks; the model was parametrized using results obtained from controlled experiments. For performance purposes, mean response time is the key metric to be minimized, whereas for security purposes, response time variance and classification accuracy must be taken into account. The key insights from our analysis are: (i) there is an optimal number of servers which minimizes the response time variance, (ii) the sweet-spot number of servers that minimizes response time variance and maximizes classification accuracy is typically smaller than or equal to the one that minimizes mean response time. Therefore, for security purposes, it may be worth slightly sacrificing performance to increase classification accuracy.

View this article on IEEE Xplore