Predicting high-frequency variation in stream solute concentrations with water quality sensors and machine learning

TitlePredicting high-frequency variation in stream solute concentrations with water quality sensors and machine learning
Publication TypeJournal Article
Year of Publication2021
AuthorsGreen, MB, Pardo, LH, Bailey, SW, Campbell, JL, McDowell, WH, Bernhardt, ES, Rosi, EJ
JournalHydrological Processes
Date Published2021///
ISBN Number1099-1085
KeywordsBiogeochemistry, LTER-HBR, machine learning, stream solutes, water quality

Stream solute monitoring has produced many insights into ecosystem and Earth system functions. Although new sensors have provided novel information about the fine-scale temporal variation of some stream water solutes, we lack adequate sensor technology to gain the same insights for many other solutes. We used two machine learning algorithms – Support Vector Machine and Random Forest – to predict concentrations at 15-min resolution for 10 solutes, of which eight lack specific sensors. The algorithms were trained with data from intensive stream sensing and manual stream sampling (weekly) for four full years in a hydrologic reference stream within the Hubbard Brook Experimental Forest in New Hampshire, USA. The Random Forest algorithm was slightly better at predicting solute concentrations than the Support Vector Machine algorithm (Nash-Sutcliffe efficiencies ranged from 0.35 to 0.78 for Random Forest compared to 0.29 to 0.79 for Support Vector Machine). Solute predictions were most sensitive to the removal of fluorescent dissolved organic matter, pH and specific conductance as independent variables for both algorithms, and least sensitive to dissolved oxygen and turbidity. The predicted concentrations of calcium and monomeric aluminium were used to estimate catchment solute yield, which changed most dramatically for aluminium because it concentrates with stream discharge. These results show great promise for using a combined approach of stream sensing and intensive stream discrete sampling to build information about the high-frequency variation of solutes for which an appropriate sensor or proxy is not available.