Effect of Data Characteristics Inconsistency on Medium and Long-Term Runoff Forecasting by Machine Learning

In the application of medium and long-term runoff forecasting, machine learning has some problems, such as high learning cost, limited computing cost, and difficulty in satisfying statistical data assumptions in some regions, leading to difficulty in popularization in the hydrology industry. In the case of a few data, it is one of the ways to solve the problem to analyze the data characteristics consistency. This paper analyzes the statistical hypothesis of machine learning and runoff data characteristics such as periodicity and mutation. Aiming at the effect of data characteristics inconsistency on three representative machine learning models (multiple linear regression, random forest, back propagation neural network), a simple correction/improvement method suitable for engineering was proposed. The model results were verified in the Danjiangkou area, China. The results show that the errors of the three models have the same distribution as the periodic characteristics of the runoff periods, and the correction/improvement based on periodicity and mutation characteristics can improve the forecasting accuracy of the three models. The back propagation neural network model is most sensitive to the data characteristics consistency.

View this article on IEEE Xplore