Gagne Final Paper

Automated Classification of Convective Areas in Reflectivity Using Decision Trees

David John Gagne II, Amy McGovern, and Jerry Brotzge

Abstract:

This paper presents an automated approach to classifying storms based on their structure using decision trees. When dealing with large datasets, manually classifying storms quickly becomes a repetitive and time-consuming task. An automated system can more quickly and efficiently sort through large quantities of data and return value-added output in a form that can be more easily manipulated and understood. Our method of storm classification combines two machine learning techniques, k-means clustering and decision trees. Kmeans segments the reflectivity data into clusters and decision trees classify each cluster. We chose decision trees for their simplicity and ability to screen out unimportant attributes.

We used a k-means clustering algorithm derived from Lakshamanan (2001) to divide the reflectivity into different regions. Each cluster was sorted as convective or stratiform based on reflectivity. Each convective cluster was hand labeled at both a general and a specific level. The two general classifications were storm cells and linear systems. The specific classifications for cells were isolated severe, isolated non-severe, and circular Mesoscale Convective System (MCS). The specific classifications for linear systems were trailing stratiform, leading stratiform, and no or parallel stratiform. We used the Waikato Environment for Knowledge Analysis (WEKA), a machine learning suite, to develop the decision trees (Witten and Frank, 2005).

We constructed multiple decision trees with both morphological and reflectivity attributes for both the general and specific classifications. The training and test data sets came from Advanced Regional Prediction System (ARPS) simulated reflectivity data (Xue et al., 2001, 2002, 2003), and we created an additional data set from a collection of composite reflectivity mosaics from the CASA IP1 network (Brotzge et al., 2006). Overall, the best accuracy for the general type trees stayed in the 90% range for all three test sets indicating a very reliable classification tree. By verifying the trees learned on simulated data with observations from the CASA network, we demonstrated that the knowledge gained from simulation can be applied to real situations. For the specific type, the accuracy ranged from 55% to 80% across the test sets, implying additional work is needed for improvement.

Full Paper [PDF]