NWC REU 2016
May 23 - July 29



Photo of author

Verification of Automated Hail Forecasts From the 2016 Hazardous Weather Testbed Spring Experiment

Joseph Nardi, Amy McGovern, Nate Snook, and David John Gagne II


What is already known:

  • It is challenging to forecast severe hail due to uncertainties in numerical weather prediction models and observations.
  • HAILCAST, a popular hail prediction model, shows considerable skill in its ability to forecast hail size.
  • Machine learning approaches show some advantages over physics-based hail forecasts.

What this study adds:

  • The Gagne Machine Learning Method has slightly higher skill and discrimination in both the forecasts of 25 mm and 50 mm hail than HAILCAST or the Thompson Hail Size Method.
  • HAILCAST performed better at forecasting hail greater than 50 mm in the case study, however, it also has a greater false alarm rate.
  • The Gagne Machine Learning Method is more consistent over all the microphysics schemes as the model is calibrated to each microphysics scheme.


Every spring, the Storm Prediction Center (SPC) and the National Severe Storms Laboratory (NSSL) run an experiment to improve the prediction of severe weather called the Hazardous Weather Testbed. One of the major goals of the experiment is to forecast individual hazards, such as hail. These hail forecasts are run on the Center for Analysis and Prediction of Storms (CAPS) mixed physics ensemble. This ensemble is run using the Advanced Research Weather Research and Forecasting (WRF-ARW) numerical weather prediction model with 9 ensemble members and horizontal grid-spacing of 3 km. Automated hail forecasts are run for a 24 hour period using three different methods: HAILCAST, the Thompson Hail Size Method, and the Gagne Machine Learning Method.


To verify the three hail forecasting methods, neighborhood ensemble probabilities are calculated for a 24 hour period for both 25 mm and 50 mm hail. These hail forecasting methods are verified against data from the NSSL Multi-Radar Multi-Sensor (MRMS) radar mosaic using the Maximum Expected Size of Hail (MESH) method. Relative Operating Characteristic (ROC) curves as well as Attribute Diagrams were created along with calculating the ROC Area Under the Curve (ROC AUC) and Brier Skill Score. A case study of May 26, 2016 was performed; on this day a large complex of storms moved over Nebraska, Kansas, Oklahoma, and Texas, producing 204 reports of severe hail, 183 reports of severe wind, and 21 tornado reports.


Overall, the Gagne Machine Learning Method has greater skill, in terms of the Brier Skill Score, than the other two hail forecasting methods. The Gagne Machine Learning Method also exhibits better discrimination for 25 mm hail in terms of the ROC AUC score. Lastly, the Gagne Machine Learning Method consistently performs well across all microphysics schemes because it is calibrated on each microphysics scheme. For the May 26, 2016 case study, the Gagne Machine Learning method exhibited greater capability to predict hail exceeding 25 mm in diameter while producing relatively few false alarms.

Full Paper [PDF]