Automated Scalable Detection of Location-Specific Santa Ana Conditions from Weather Data using Unsupervised Learning
Nguyen, M., Crawl, D., Li, J., Uys, D., Altintas, I., Automated Scalable Detection of Location-Specific Santa Ana Conditions from Weather Data using Unsupervised Learning, In Proceedings of the 2017 IEEE International Conference on Big Data.
Southern California's dry climate and fire-prone vegetation make the area vulnerable to extreme wildfire conditions. These conditions are exacerbated by Santa Ana weather patterns, which are characterized by very low humidity and gusty winds blowing in from the deserts. We present an approach using unsupervised learning to model and detect Santa Ana conditions based on sensor measurements from weather stations. Our approach uses cluster analysis to capture weather patterns specific to the region surrounding each weather station. A method is provided to automatically determine the Santa Ana cluster for each cluster model using dynamic, data-driven criteria. The resulting cluster models are applied to real-time sensor measurements to provide location-specific and time-specific detection of Santa Ana conditions. The Spark distributed platform is leveraged to scale the system to large datasets from multiple weather stations, and the Kepler workflow system is used to provide a GUI-based, easy-to-use interface to the underlying system. Results of testing our approach on an existing network of weather stations are presented. Our scalability experiment shows that the approach can process up to one million live sensor measurements in less than one minute on one machine. The proposed system can be used to aid in wildfire management and prevention by focusing firefighting efforts on regions with increased wildfire risks.