Effect of Data Characteristics Inconsistency on Medium and Long-Term Runoff Forecasting by Machine Learning

In the application of medium and long-term runoff forecasting, machine learning has some problems, such as high learning cost, limited computing cost, and difficulty in satisfying statistical data assumptions in some regions, leading to difficulty in popularization in the hydrology industry. In the case of a few data, it is one of the ways to solve the problem to analyze the data characteristics consistency. This paper analyzes the statistical hypothesis of machine learning and runoff data characteristics such as periodicity and mutation. Aiming at the effect of data characteristics inconsistency on three representative machine learning models (multiple linear regression, random forest, back propagation neural network), a simple correction/improvement method suitable for engineering was proposed. The model results were verified in the Danjiangkou area, China. The results show that the errors of the three models have the same distribution as the periodic characteristics of the runoff periods, and the correction/improvement based on periodicity and mutation characteristics can improve the forecasting accuracy of the three models. The back propagation neural network model is most sensitive to the data characteristics consistency.

View this article on IEEE Xplore

 

Software Fault-Proneness Analysis based on Composite Developer-Module Networks

Existing software fault-proneness analysis and prediction models can be categorized into software metrics and visualized approaches. However, the studies of the software metrics solely rely on the quantified data, while the latter fails to reflect the human aspect, which is proven to be a main cause of many failures in various domains. In this paper, we proposed a new analysis model with an improved software network called Composite Developer-Module Network. The network is composed of the linkage of both developers to software modules and software modules to modules to reflect the characteristics and interaction between developers. After the networks of the research objects are built, several different sub-graphs in the networks are derived from analyzing the structures of the sub-graphs that are more fault-prone and further determine whether the software development is in a bad structure, thus predicting the fault-proneness. Our research shows that the different sub-structures are not only a factor in fault-proneness, but also that the complexity of the sub-structure can affect the production of bugs.

*Published in the IEEE Reliability Society Section within IEEE Access.

View this article on IEEE Xplore

 

A Simple Sum of Products Formula to Compute the Reliability of the KooN System

Reliability block diagram (RBD) is a well-known, high-level abstract modeling method for calculating systems reliability. Increasing redundancy is the most important way for increasing Fault-tolerance and reliability of dependable systems. K-out-of-N (KooN) is one of the known redundancy models. The redundancy causes repeated events and increases the complexity of the computing system’s reliability, and researchers use techniques like factorization to overcome it. Current methods lead to the cumbersome formula that needs a lot of simplification to change in the form of Sum of the Products (SoP) in terms of reliabilities of its constituting components. In This paper, a technique for extracting simple formula for calculating the KooN system’s reliability in SoP form using the Venn diagram is presented. Then, the shortcoming of using the Venn diagram that is masking some joints events in the case of a large number of independent components is explained. We proposed the replacement of Lattice instead of Venn diagrams to overcome this weakness. Then, the Lattice of reliabilities that is dual of power set Lattice of components is introduced. Using the basic properties of Lattice of reliabilities and their inclusion relationships, we propose an algorithm for driving a general formula of the KooN system’s reliability in SoP form. The proposed algorithm gives the SoP formula coefficients by computing elements of the main diagonal and elements below it in a squared matrix. The computational and space complexity of the proposed algorithm is θ ((n – k) 2 /2) that n is the number of different components and k denotes the number of functioning components. A lemma and a theorem are defined and proved as a basis of the proposed general formula for computing coefficients of the SoP formula of the KooN system. Computational and space complexity of computing all of the coefficients of reliability formula of KooN system using this formula reduced to $\theta (n-k)$ . The proposed formula is simple and is in the form of SoP, and its computation is less error-prone.

View this article on IEEE Xplore

Improving Predictability of User-Affecting Metrics to Support Anomaly Detection in Cloud Services

Anomaly detection systems aim to detect and report attacks or unexpected behavior in networked systems. Previous work has shown that anomalies have an impact on system performance, and that performance signatures can be effectively used for implementing an IDS. In this paper, we present an analytical and an experimental study on the trade-off between anomaly detection based on performance signatures and system scalability. The proposed approach combines analytical modeling and load testing to find optimal configurations for the signature-based IDS. We apply a heavy-tail bi-modal modeling approach, where “long” jobs represent large resource consuming transactions, e.g., generated by DDoS attacks; the model was parametrized using results obtained from controlled experiments. For performance purposes, mean response time is the key metric to be minimized, whereas for security purposes, response time variance and classification accuracy must be taken into account. The key insights from our analysis are: (i) there is an optimal number of servers which minimizes the response time variance, (ii) the sweet-spot number of servers that minimizes response time variance and maximizes classification accuracy is typically smaller than or equal to the one that minimizes mean response time. Therefore, for security purposes, it may be worth slightly sacrificing performance to increase classification accuracy.

View this article on IEEE Xplore

An In-Depth Study on Open-Set Camera Model Identification

 

Camera model identification refers to the problem of linking a picture to the camera model used to shoot it. In this paper, as this might be an enabling factor in different forensic applications to single out possible suspects (e.g., detecting the author of child abuse or terrorist propaganda material), many accurate camera model attribution methods have been developed. One of their main drawbacks, however, are a typical closed-set assumption of the problem. This means that an investigated photograph is always assigned to one camera model within a set of known ones present during the investigation, i.e., training time. The fact that a picture can come from a completely unrelated camera model during actual testing is usually ignored. Under realistic conditions, it is not possible to assume that every picture under analysis belongs to one of the available camera models. In this paper, to deal with this issue, we present an in-depth study on the possibility of solving the camera model identification problem in open-set scenarios. Given a photograph, we aim at detecting whether it comes from one of the known camera models of interest or from an unknown one. We compare different feature extraction algorithms and classifiers especially targeting open-set recognition. We also evaluate possible open-set training protocols that can be applied along with any open-set classifier, observing that a simple alternative among the selected ones obtains the best results. Thorough testing on independent datasets show that it is possible to leverage a recently proposed convolutional neural network as feature extractor paired with a properly trained open-set classifier aiming at solving the open-set camera model attribution problem even on small-scale image patches, improving over the state-of-the-art available solutions.

View this article on IEEE Xplore