ML in Freshwater Management

ML in Freshwater Management

Session #1: Sept 6th, 6PM ET.
Satellite data has enabled new ways of monitoring water levels around the world. However, all satellite measurements have a certain amount of error & this group will use ML techniques to decrease that error so better water management policies might result!
Pre-Requisites - This engagement requires a working understanding of Python, data science practices, and machine learning models. No domain-specific expertise in environmental sciences is necessary, but interest is preferred.

Leads / Advisors

notion image
~Debjani Mukherjee | Lead Consultant Green Connect Solutions
notion image
~Jesse Passmore | Lead Global Resources Analysis Group
notion image
~Dr. Zhauqin Li | Advisor
Prior Experience with Post Doctoral Research with Natural Resources Canada
❗ Weekly Meeting Time: 6-7 PM ET on Tuesdays
❗ Slack channel [Communication point]: click here to join [If issues, drop an email]


Lake water level monitoring is fundamental for our precious freshwater resources management and hydrological research, especially in the context of climate change and intensified anthropogenic activities.
In situ hydrological (ie the manual way) measuring of bodies of water are either not possible or not reliable for most of Earth’s freshwater, as a result, altimetry measurements (measurements of height/altitude) from satellites such as the European Space Agency’s ‘Sentinel’ series are relied upon. Such measurements have irreducible hidden variables, an unknowable error, that needs to be taken into account to ensure the readings are useful. Currently, addressing these irreducible limitations is an algorithm called a ‘Kalman Filter’.
A Kalman Filter is one of the most important and common estimation algorithms. The Kalman Filter produces estimates of hidden variables based on inaccurate and uncertain measurements. Also, the Kalman Filter predicts the future system state based on past estimations.
In more recent years there have been attempts to make the estimations inside a Kalman Filter more accurate through deep learning. Types of recurrent Neural Networks can be used to learn local or global states based on given measurements, with accuracy honed enough that they are used by stock market brokerages. However in the satellite altimetry domain, thus far such attempts to combine a RNN with a Kalman Filter have only provided questionable improvement over a regular Kalman Filter. In addition, these attempts are so computationally expensive as to be beyond the reach of most people’s accessible hardware.
In the previous Working Group [attending not a pre-requisite], we explored the KalmanNet paper, which used a Neural Network at iterative stages to reduce the estimated elements as much as possible. Though the KalmanNet paper showed incredible promise, the associated Github repo did not deliver commensurately.
In the end, both teams of the Working Group opted to create a linear Kalman Filter, and achieved reasonable results therein; with plenty of room for improvement.
Though the KalmanNet repo did not function, even had it done so; we came to the conclusion that running a Neural Network to study the state represented in a series of measurements per time steps, in every single timestep; would be far more computationally expensive than the possible tiny improvements in accuracy procured. It was deduced that a different structuring of neural network and Kalman Filter combination would be needed.
If we’re able to improve on what’s currently available, this could lead to better water management data, and therefore better policy making in this arena ‼️

Minimum Success

Minimum Goal:
  • Build a Kalman filter with more accurate estimations in measurement and error compared to industry standard Kalman filters, being as computationally inexpensive as possible.
Stretch Goal:
  • Expand the use and robustness of our Kalman Filter to water levels in a variety of lakes around the world, focusing on bodies of water rendered critically low by changing ecosystems (whether due to climate or human interference).

Early Audience Hypothesis

The target audience for this working group would be any public or private agency with a satellite whose purpose at least in part, is to take altimetry measurements. In addition, any agency that procures the services of such satellites for water level observation, flood threats, and changing water tables. Examples would be NASA, ESA (European Space Agency), NRCan (Natural Resources Canada), and governments of countries with rapidly disappearing fresh water such as South Africa and USA.

Starting Dataset

Four datasets total: Sentinel A, Sentinel B, Sentinel A+B, and Cleaned Sentinel A+B (two standard deviations of water levels taken).
M Sat water level & error readings for Lake Winnipeg: M_Sat. (Already cleaned and non-useful data removed, and error calculated per timestep. Timesteps have been trimmed to those of the Sentinel Satellites.)

Starting Recipes (please be familiar with these before session 1)

  • The KalmanNet paper: This was the scholarly article that sparked the original Machine Learning in Freshwater Management group’s initiative. Though the associated code repository did not prove fruit bearing, spending a few minutes understanding what the KalmanNet team were attempting to do will give a good incite into what this working group aims at.
  • Pytorch Basics: For those who are interested in using deep learning to improve the accuracy of Kalman Filters, having a baseline understanding of Pytorch or Tensorflow is mandatory.
  • Getting Started with Google Colab: For those who do not have access to a local system capable of adequate data science work, Google Colab is a must (and it’s free).

Recipes Relevant to the Working Group, throughout our 8 weeks:


Tentative Timeline

Major Milestones
Expected Finish
Get familiar with the domain and previous work.
Week 1
Download sample data, explore the data, clean and structure the data, identify the correlations, perform feature extraction and run baseline model [based on colab with all the code needed]
Week 1
Collaborative optimization and experimentation [playing with classes, features, and models for improvement]
4 weeks
Expanding the model(s) to lakes around the world, continued exploration of model(s) as results come in.
3 weeks

Why join?

Aggregate Intellect hosts one of the most diverse ML communities in the world. Over the course of the working group
  • You’ll get an immersion into that community & walk out with some cool new friends.
  • Get spotlighted for your efforts on our community website!
  • Advance your ML skills in remote sensing for the water resources management domain


Week 1 Results

Aggregate Intellect hosts one of the most diverse ML communities in the world. Over the course of the working group
  • A Random Forest Regressor & Linear Kalman Filter hybrid. Taking the example Kalman Filter and Random Forest Regression models from above, and combining them at uniformly distributed time steps.
    • The KF trusts its own predictions more, and longer.
    • The amplitude of errors remains constant, instead of climbing.
    • The Kalman Gain remains constant, instead of narrowing in on 1.

    Week 2 Results

    • Sanjay B’s Introduction to Tensorflow and LSTM’s. An excellent guide to learn from, accessible to beginners.
      • Examples of changes in the parameters within a Deep Learning model are visually explored.
      • Any questions, please ask Sanjay. He’s very knowledgable.
    • A LSTM & Linear Kalman Filter hybrid. Taking the example LSTM from the Deep Learning intro Colab Notebook, and injecting it in a similar manner to the Random Forest Regressor from last week.
      • The model error predictions are averaged with the Kalman Filter predictions every 39th time step. To help offset the overconfidence issue of the previous hybrid.
      • Akin to the previous hybrid, this one trusts itself throughout the measurement process. (But does so while still incorporating more of the recorded results into its predictions.)
      • The Kalman Gains are now gently narrowing in on ‘0’, though are extremely erratic.

    Week 3 Results

    • Several ML/DL - KF hybrids. Experimenting with various ML and DL models with the initial linear Kalman filter model.
      • The hunt continues for the elusive ideal of a Kalman Filter narrowing in on ‘0’ in Kalman Gains, with steady and low error, and a predicted water level line graph that equally ignores and considers the measured results.
      • With increased intentional bias introduced to correct overconfidence problems, or lack of confidence problems in the KF; introduce bizarre results.
      • Robustness needs to be introduced to the overarching model; whether through a differing KF, or in the ML/DL aspect. This could be different feature engineering, to a multivariate LSTM.

    Week 4 Results

    • Anonymous Bob’s time series ML approach to ML/KF hybrid. Using the RFR and KF models from this landing page, an intrepid exploration into a hybrid Kalman Filter is made.
      • Instead of partially/wholly replacing the error prediction within a Kalman Filter every X time steps, this model partially replaces the water level prediction every X time steps.
      • Unique feature engineering is executed, more closely resembling stock market features.
      • The optimal X for interference frequency is investigated, with clear results abounding from the foray.
      • The author mostly copy and pasted code from this page; demonstrating the power of teamwork, and how anyone can start contributing nomatter how new to the data science field they are.
    • Anonymous Homer’s LSTM/GRU KF replacement. Using the LSTM and GRU posted in earlier weeks, an attempt to bypass the Kalman Filter altogether is made.
      • The author made the LSTM/GRU much larger, with more layers and neurons, and a longer loopback sequence for prediction.
      • If the measurements provided by the satellite were accurate, this approach would be ideal. However, the reason for a Kalman Filter is the measurements aren’t accurate; and we don’t know how inaccurate.
      • Using the scoring matrix for next week, it might be possible for the author to take this route in the future.
    • A LSTM/GRU to replace the error prediction, in combination with a given time step’s Kalman Gain. Instead of replacing the error prediction strictly, the confidence of the Kalman Filter is taken into account.
      • This provided little better results then replacing the error without considering Kalman Gain.
      • As the Kalman Gain is sensitive to a tilmestep’s error coefficient, using Kalman Gain to tie a ML/DL prediction to the current state of the model is unreliable. Our data has massively varying amplitudes of error, almost entirely due to environmental noise.
      • It is unlikely that the current Kalman Filter equation will allow for results much superior than those experienced, and as such; will need to be modified.

    Week 5 Results

    • The second generation Kalman Filter, and new EDA. The previous model was very sensitive to the value of error in a given time step. This has been addressed.
      • In this Colab Notebook, the raw dataset is used. It is demonstrated to be ~5x noisier than the two standard deviation cleaned dataset.
        • Good) We have 16% more data for our models, much of which isn’t outlier.
        • Bad) The dataset is ~5x noisier. On Lake Winnipeg some error values are over 40 meters.
      • Despite the extra noise, the upgraded Kalman Filter’s estimated errors remain constant and in an range around 3 meters. In this scenario, the algorithm believes the data is only ever around 3 meters away from the ground truth.
      • However, bereft of ML/DL influence; the model has similar confidence and Kalman Gain struggles as the model from week 1.
    • Whereas this is the original filter, on the raw data.
      • Eureka! The original Kalman Filter on the raw data demonstrates that a two standard deviation dataset leaves too much useful information behind.
      • The 2 std dev dataset is artificially making the satellite readings too accurate. Thus the KF believes the measurements too much.
      • However, the raw data does have outliers that aren’t true to the Sentinel’s capabilities, as these outlier measurements are made near shorelines.
    • Scoring! Until people are comfortable with column vector deep learning models, this notebook outlines how to find the values to train a KF-replacement model upon. We will be using sMAPE scores for comparing such models.

    Week 6 Results

    • Team Anonymous machine learning replacement of the Kalman Filter.
      • A superb first step. The hypothesis “Is a Kalman Filter the best option if a machine learning ensemble can produce the same or better predictions?
      • Your first week’s results cover a wide foundation, establishing a firm foothold on the path where you wish to explore.
        • With your permission, feedback was given in this week’s meeting; which benefitted many people.

    Week 7 Results

    • Team Anonymous’s ML replacement of a KF. Version 2.
      • A much more professional looking and executed notebook attempting to replace the Kalman Filter with an ensemble of machine learning methods.
      • The progress made in a single week is palpable. Feel proud.
    • Ela Najaf's Hybrid Kalman Filter. Athabasca. Great Slave. Wood. Cedar. Winnipeg.
      • Phenomenal progress and findings on all of the lakes.
        • The same filter performs with varying accuracy on different lakes.
        • Tuning the interval of influence offers visibly improved results in Kalman Gains and Uncertainty.

    This effort is being sponsored by our friends at

    notion image
    Built with