Warning: We observe an increase of emails from fake travel portals like . "travelhosting.co.uk". We never send links to such portals so be vigilant!

8–10 Apr 2024
Bürgerhaus Garching
Europe/Berlin timezone
Event fully booked +++ Registration closed!

MLExchange: A Machine Learning Platform for On-the-fly Data Analysis at Scientific User Facilities

8 Apr 2024, 16:50
30m
Bürgerhaus 1 - Bürgerhaus Main ball room (Bürgerhaus Garching)

Bürgerhaus 1 - Bürgerhaus Main ball room

Bürgerhaus Garching

Bürgerplatz 9 and Telschowstraße 4, 85748 Garching bei München and MLZ Lichtenbergstr. 1 85747 Garching
200
Show room on map
Invited talk MLC Session 4

Speaker

Tanny Chavez (Lawrence Berkeley National Laboratory)

Description

With the continuous enhancement of experimental capabilities at scientific user facilities, the demand for computational tools that seamlessly guide users through their data lifecycle grows exponentially. These tools play an important role in facilitating the application of machine learning (ML) techniques to accelerate materials discovery. In light of this, MLExchange introduces a collaborative web-based platform to democratize diverse workflows for on-the-fly data visualization, rapid ML-based data analysis, automated experiments, and other applications. Currently, MLExchange offers a selection of web-based graphical user interfaces (GUI) for image segmentation, latent space exploration, data labeling, and classification [1].

In particular, its data labeling pipeline, Label Maker, aims to accelerate the demanding and time-consuming process of labeling scientific data sets through similarity-based querying, clustering, and classification approaches. To achieve this, its architecture connects four independent GUIs: (1) Data Clinic for latent space extraction, (2) MLCoach for data classification, (3) Latent space explorer for dimension reduction, latent space visualization, and clustering, and (4) Label Maker for data visualization and label assignment. Across this pipeline, the web applications make use of an assortment of ML-based techniques, including principal component analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) for dimension reduction, Density-based spatial clustering (DBSCAN) and Mini Batch K-means for data clustering, and tunable deep learning algorithms for latent space extraction and data classification. Label Maker has shown potential applications for cross-facility learning by using Tiled for data access, which has enabled the visualization of Resonant Soft X-ray Scattering data collected at the National Synchrotron Light Source II. Furthermore, we have successfully demonstrated its effectiveness in enhancing the fine-tuning process of foundational models with human feedback.

Overall, the MLExchange platform offers a collaborative ecosystem to easily deploy ML-based algorithms for scientific data analysis. Among these efforts, MLExchange aims to enhance its capabilities to handle complex workflows, such as mitigating training biases with foundational models and enabling cross-facility model training.

[1] Z. Zhao, T. Chavez, E. A. Holman, G. Hao, A. Green, H. Krishnan, D. McReynolds, R. J. Pandolfi, E. J. Roberts, P. H. Zwart, H. Yanxon, N. Schwarz, S. Sankaranarayanan, S. V. Kalinin, A. Mehta, S. I. Campbell, and A. Hexemer, “MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies,'' in 2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP), 2022, pp. 10–15. doi: 10.1109/XLOOP56614.2022.00007

Primary author

Tanny Chavez (Lawrence Berkeley National Laboratory)

Co-authors

Zhuowen Zhao (Lawrence Berkeley National Laboratory) Runbo Jiang (Lawrence Berkeley National Laboratory) Elizabeth A. Holman (Lawrence Berkeley National Laboratory) Adam Green (Lawrence Berkeley National Laboratory) Harinarayan Krishnan (Lawrence Berkeley National Laboratory, Center for Advanced Mathematics in Energy Research Applications) Wiebke Koepp (Lawrence Berkeley National Lab) Dylan McReynolds (Lawrence Berkeley National Lab) Ronald Pandolfi (Lawrence Berkeley National Laboratory, Center for Advanced Mathematics in Energy Research Applications) Eric J. Roberts (Lawrence Berkeley National Laboratory, Center for Advanced Mathematics in Energy Research Applications) Petrus H. Zwart (Lawrence Berkeley National Laboratory, Center for Advanced Mathematics in Energy Research Applications) Guanhua Hao (Lawrence Berkeley National Laboratory) Howard Yanxon (Argonne National Laboratory) Nicholas Schwarz (Argonne National Laboratory) Eliot H. Gann (Brookhaven National Laboratory) Daniel B. Allan (Brookhaven National Laboratory) Daniela Ushizima (Lawrence Berkeley National Laboratory, Center for Advanced Mathematics in Energy Research Applications) Edward Barnard (Lawrence Berkeley National Laboratory) Apurva Mehta (SLAC National Accelerator Laboratory) Subramanian Sankaranarayanan (Argonne National Laboratory, University of Illinois Chicago) Alexander Hexemer (Lawrence Berkeley National Lab)

Presentation materials

There are no materials yet.