For space scientists analyzing vast datasets from increasingly powerful space instrumentation, big data has become a significant challenge. To address this, a team from the Southwest Research Institute created a machine learning tool that efficiently labels large, complex datasets, allowing deep learning models to sift through and identify potentially hazardous solar events. The new labelling tool can be used or adapted to solve other problems involving large datasets.
Scientists are finding it more difficult to process and analyse relevant trends as space instrument packages collect increasingly complex data in ever-increasing volumes. Machine learning (ML) is becoming an important tool for processing large complex datasets, in which algorithms learn from existing data to make decisions or predictions that can factor in more information at the same time than humans. However, in order to use ML techniques, humans must first label all of the data, which is often a monumental task.
“Labeling data with meaningful annotations is a critical step in supervised machine learning. Labeling datasets, on the other hand, is tedious and time consuming “Dr.Subhamoy Chatterjee, a postdoctoral researcher at SwRI who specialises in solar astronomy and instrumentation and is the lead author of a paper about these findings published in the journal Nature Astronomy, said “New research demonstrates how convolutional neural networks (CNNs) trained on crudely labelled astronomical videos can be used to improve data labelling quality and breadth while reducing the need for human intervention.”
Deep learning techniques, by extracting and learning complex patterns, can automate the processing and interpretation of large amounts of complex data. The SwRI team used solar magnetic field videos to identify areas on the solar surface where strong, complex magnetic fields emerge, which are the main precursors of space weather events.
“We trained CNNs with crude labels, manually verifying only our disagreements with the machine,” explained co-author Dr. Andrés Muoz-Jaramillo, a SwRI solar physicist with machine learning expertise. “The algorithm was then retrained with the corrected data, and the process was repeated until we were all in agreement. While most flux emergence labelling is done by hand, this iterative interaction between the human and the ML algorithm reduces manual verification by 50 percent”.
Iterative labelling approaches, such as active learning, can save significant time, lowering the cost of preparing big data for ML. SwRI scientists also used the trained ML algorithm to provide an even richer and more useful database by gradually masking the videos and looking for the moment when the ML algorithm changes its classification.
“We developed an end-to-end, deep-learning approach for classifying videos of magnetic patch evolution without explicitly supplying segmented images, tracking algorithms, or other handcrafted features,” said SwRI co-author Dr. Derek Lamb, who specializes in the evolution of magnetic fields on the Sun’s surface. “This database will be essential in the development of new methodologies for forecasting the emergence of complex regions conducive to space weather events, potentially increasing the amount of time we have to prepare for space weather.”