DOI: 10.65398/WWUM1964
Prof. Zeljko Ivezic, Professor of Astronomy, University of Washington, Director of the Vera C. Rubin Observatory Construction Project
Changing Astronomy and AI: The Case of Rubin Observatory and its Legacy Survey of Space and Time
1. Changing Astronomy and Astrophysics
Major advances in our understanding of the universe have historically arisen from dramatic improvements in our ability to “see”. With the development of advanced instrumentation we have been able to parse radiation detected from distant sources over the full electromagnetic spectrum in increasingly subtle ways. These data have provided the detailed information needed to construct physical models of planets, stars, galaxies, quasars, and larger structures and to probe the new physics of dark matter and dark energy.
Over the past few decades, advances in technology have made it possible to move beyond the traditional observational paradigm, focused on small samples of cosmic sources or individual objects, and to undertake large-scale sky surveys. During the last decade, sky surveys across the electromagnetic spectrum have collected petabytes of astronomical data for billions of sources. These survey projects, based on a synergy of advances in telescope construction, detectors, and, above all, information technology, have dramatically impacted nearly all fields of astronomy and astrophysics.
With the increase of data volume and data complexity, modern computational technologies, such as machine learning and AI methods, are necessary tools to reduce, analyze, and comprehend these data. Modern algorithms are able to classify and predict the paths of comets and asteroids, uncover subtle patterns in galaxy large-scale sky distribution, as well as to help with mundane tasks that would require prohibitively large or expensive human effort. As an example, a recently created large language model called astroBERT can help researchers search and navigate a sample of 15 million scientific papers on astronomy (Grezes et al. 2022).
Astronomers have been successively using such algorithms for over three decades (for a compendium of the most popular machine algorithms in astronomy, see Ivezić et al. 2019a). The rapid recent development of machine learning and AI methods goes hand in hand with the rapid development of astronomical surveys. I will illustrate this synergy here by focusing on the NSF-DOE Vera C. Rubin Observatory that is being constructed in Chile.
The Rubin Observatory will revolutionize the way we explore the cosmos. Its first 10-year project, the Legacy Survey of Space and Time (LSST), will collect about 60 petabytes of raw image data and produce the largest catalog of celestial objects in history – it will include about 20 billion galaxies and a similar number of stars: for the first time, astronomers will have cataloged more objects than there are living people on Earth!
2. The Case of Rubin Observatory
The primary goal of Rubin Observatory is to conduct LSST and deliver a 500-petabyte set of data products that will address some of the most pressing questions about the structure and evolution of the universe and the objects in it. The Rubin Observatory’s LSST is designed to address four science areas (for more details, see Ivezić et al. 2019b):
- Probing dark energy and dark matter.
- Taking an inventory of the solar system.
- Exploring the transient optical sky.
- Mapping the Milky Way.
The scientific questions that the Rubin Observatory will address are profound, and yet the concept behind the design of the Rubin Observatory is remarkably simple: conduct a deep survey over an enormous area of sky; do it with a frequency that enables images of every part of the visible sky to be obtained every few nights; and continue in this mode for ten years. The working paradigm is that all scientific investigations will utilize a common database constructed from an optimized observing program.
The Rubin Observatory Summit Facility is located on the Cerro Pachón ridge in north-central Chile, and its construction will be completed in 2025.
2.1 The Simonyi Survey Telescope and LSST Camera
The Rubin Observatory takes advantage of new technologies to provide a qualitatively new capability for astronomy. The Rubin Observatory houses the Simonyi Survey Telescope, an 8.4-meter telescope with a novel, three-mirror design. The telescope’s compact shape allows it to move quickly from one point in the sky to the next. It will image the sky continuously each night, on an automated cadence, and over the course of the ten-year survey will collect about 800 images of each location over half the sky.
The Rubin Observatory LSST Camera is the largest digital camera ever constructed for the field of astronomy. The size of a small car and weighing more than 3 tons, the 3200-megapixel camera has a field of view of about 10 square degrees.
In addition, Rubin Observatory will include a complex data management system, described below.
2.2 Rubin Observatory Software Tools
As with all large modern surveys, the large data volume, the real-time aspects, and the complexity of processing involved require that the survey itself take on the task of fully reducing the data. The data collected by Rubin Observatory will be automatically reduced to scientifically useful catalogs and calibrated images, and delivered to users using a custom access platform and tools.
The entire software framework developed for the needs of the Rubin Observatory includes several million lines of code and is mainly written using the programming languages python and C++. The required software can be divided into three main groups: i) management of the Observatory, i.e. telescope, camera, dome, etc. ii) correction of imperfections in images, calibration of images, and finding and measuring astronomical objects in images, iii) tools to access data via the Internet. I will briefly illustrate their main features.
Practically all processes at the Rubin Observatory are controlled by software. The telescope and dome are large, complex and very sensitive mechanical systems (each has a mass of about 300 tons). For example, deformations of the surface of the primary mirror caused by gravity and thermal stresses are corrected about 10 times per second using 156 cylinders that from the underside of the mirror (gently!) push or pull the mirror with an accuracy of a few nanometers. The positions and orientation of the secondary mirror and camera are managed with similar accuracy. Figure 1 shows part of the control interface in the control room during the preparations for the observation. The Rubin control room is very similar to the control room of large industrial systems, for example, nuclear power plants.
Each observing night, Rubin will capture about a thousand 3,200-megapixel LSST images. Deciding exactly where to point the telescope and with which filter the camera should optimally capture the next image is a complex optimization step and includes a detailed consideration of the observing conditions (what time of night is it, where is the Moon and in what phase, is the atmosphere stable, how many images have already been collected, etc.). Such a decision needs to be made every forty seconds, so due to speed and complexity it is necessary to use a computer program (called “LSST Scheduler”). In other words, Rubin’s observing system, telescope and camera, is essentially an AI-powered robot that will observe all night on its own with minimal supervision from astronomers (see Figure 2).
The software pipelines for processing LSST images are the most complex part of Rubin’s software due to the large amount of data (about 20 TB each observing night) and the need for fast data processing (data on objects that have changed brightness or position compared to previous observations will be available to everyone via the Internet within just 60 seconds after taking a new image). The reason for the large amount of data is Rubin’s huge field of view and 3,200-megapixel camera (Figure 3 left). In the past, astronomers used to visually study images of the sky (Figure 3 right), but that way of working with LSST data is no longer possible. Additionally, sensitive LSST images look very complex due to the large number of partially overlapping objects (see Figure 4, right). Through the work of about a hundred Rubin programmers specialized in astronomical algorithms, computer programs have been developed over ten years that “know” how to “recognize” objects in images and measure their features such as position, brightness, angular size and shape (Bosch et al. 2018). Figure 5 shows a small part of the sky where the computer recognized six objects and calculated an excellent image model that is almost identical to the observed scene. Without the help of these advanced algorithms, it would be impossible for astronomers to figure out what the expected tens of billions of objects in LSST images will teach us about the Universe!
It is anticipated that several thousand astronomers and physicists will regularly use Rubin’s LSST data, while the interface for popularization and work with the public, available to everyone in the world, expects millions of visitors. The software for accessing scientific data is organized in the Rubin Science Platform (RSP; O’Mullane et al. 2021). Figure 6 shows the RSP interface with three main data modes. The portal is a web interface for interactive analysis of a relatively small number of objects. Jupyter notebooks option enables the search of the entire database and the analysis of selected data using programs that the user can develop and use without the need to “download” the data to the local computer (see Figure 7). This method of data access and analysis enables working with LSST data from any part of the world, from Chile to Rome.
3. Discussion and Conclusions
The LSST survey will open a movie-like window on objects that change brightness, or move, on timescales ranging from 10 sec to 10 yr. The survey will have a raw data rate of about 20 TB per night and will collect about 60 PB of data over its lifetime, resulting in an incredibly rich and extensive public archive that will be a treasure trove for breakthroughs in many areas of astronomy and physics. This archive will represent the largest catalog of celestial objects in history – it will include about 20 billion galaxies and a similar number of stars, with a total of about 32 trillion observations. With Rubin data we will all understand our Universe better, chronicle its evolution, delve into the mysteries of dark energy and dark matter, and reveal answers to questions we have yet to imagine.
Modern computational technologies, such as machine learning and AI methods, are necessary tools to reduce, analyze and comprehend these data. The LSST archive, available through Rubin Science Platform, will be mined for the unexpected and used for precision experiments in astrophysics. Rubin Observatory’s LSST will be in some sense an internet telescope: “the ultimate network peripheral device to explore the universe, and a shared resource for all humanity” (B. Gates, priv. comm.).
In summary, modern astronomy is critically dependent on modern computational technologies. By and large, it is a positive development that helps us improve our knowledge of origin and development of the universe. Nevertheless, it is not without shortcomings. I will conclude by referring to Hogg and Villar who in their position paper “Is machine learning good or bad for the natural sciences?” (Hogg & Villar, 2024) answer this question with “Both!”.
Acknowledgments
This material is based on work supported in part by the National Science Foundation through Cooperative Agreement 1258333 managed by the Association of Universities for Research in Astronomy (AURA), and the Department of Energy under Contract No. DE-AC02-76SF00515 with the SLAC National Accelerator Laboratory. Additional LSST funding comes from private donations, grants to universities, and in-kind support from LSSTC Institutional Members.
References
Bosch, J., Armstrong, R., Bickerton, S., Furusawa, H., Ikeda, H. et al. 2018, “The Hyper Suprime-Cam software pipeline”, PASJ, Vol. 70, Issue SP1.
Grezes, F., Allen, T., Blanco-Cuaresma, S., Accomazzi, A., Kurtz, M.J., et al. 2022, “Improving astroBERT using Semantic Textual Similarity”. arXiv preprint arXiv:2212.00744.
Hogg, D.W. & Villar, S. 2024, “Position: Is machine learning good or bad for the natural sciences?”, arXiv preprint arXiv:2405.18095.
Ivezić, Ž., Connolly, A.J., Vanderplas, J. & Gray, A. 2019a, Statistics, Data Mining, and Machine Learning in Astronomy, Princeton University Press.
Ivezić, Ž., Kahn, S.M., Tyson, J.A., Abel, B., Acosta, E., et al. 2019b, “LSST: from science drivers to reference design and anticipated data products”, The Astrophysical Journal, 873(2):111.
O’Mullane, W., Economou, F., Huang, F., Speck, D., Chiang, H-F., et al. 2021, “Rubin Science Platform on Google: the story so far”, arXiv preprint arXiv:2111.15030.