Amina Helmi | Full Professor, Kapteyn Astronomical Institute, University of Groningen, The Netherlands

How was our Milky Way formed?

Having grown up in Argentina and not too far from rural areas, I had the chance to enjoy the magnificent night sky in its full splendor. The vast majority of the stars that we can see with the naked eye, are in our own galaxy, the Milky Way. The Galaxy is a system constituted by planets, stars, gas, dust and the mysterious dark matter, which are held together by the action of gravity. Its flattened, disk-like shape projected on the sky (seen in Figure 1) is what has given rise to its name. Understanding how the Milky Way has formed is thus closely related to “understanding how Nature put together the night sky”.[1]

The Milky Way is one of many billions of galaxies in the Universe, and it is the one we know best. We can measure the properties of its stars in exquisite detail, something that is currently not possible for other (more distant) galaxies. The Milky Way turns out to be fairly average in terms of its mass, size, brightness and shape (2/3 of the large galaxies in the Universe are disks and are star forming). Because it is so typical it can be used to understand in general terms how galaxies form and evolve, what different physical processes are at play in the Universe and also to learn about what the Universe is made of, for example about the nature of dark matter.[2] But the Milky Way is also our home, and hence a big motivation to unravel how the Milky Way has formed stems from the human curiosity to understand our origin.

The current paradigm of structure formation in the Universe rather successfully describes the global properties of the galaxy population. It predicts that structure grows hierarchically, from tiny density fluctuations present in the early Universe, the seeds of all the structures we see today. The first galaxies to have formed were thus small, and through the action of gravity, they merged assembling larger galaxies like our Milky Way. The big question is, of course, whether this is really how the Galaxy formed. To be able to disentangle Galactic history we need measurements of the positions of stars (informing us of the current location), of their motions (which tell us where the stars came from), of their chemical properties (as their atmospheres reflect the conditions of the environment in which they formed), and of their ages (because these inform us about when they formed).

Galactic Archaeology

Stars, therefore, retain memory of their origin and can be used as fossils as it were. The subfield of Astronomy that exploits this approach has taken off dramatically in the past 15 years and is now known as Galactic Archaeology. This was a consequence of maturity of the models and the availability of new datasets, particularly of large surveys.

The Milky Way has various components, and the stars associated to each of these components have different characteristics. Clearly the majority of the stars in our Galaxy are in a thin disk-like structure, and they move in an orderly fashion on circular orbits around the centre of the Galaxy. Most stars in the disk (including our Sun) formed there, and there are new stars being born at a rate of roughly 1 sun/year. On the other hand, the most pristine stars, i.e. the oldest and those with the lowest chemical abundances,[3] are located in the Galactic halo. They have rather elongated trajectories and formed very early on; some stars appear to be as old as the Universe itself, as far as we can measure. These ancient stars have therefore effectively witnessed how the Galaxy was put together and can help us reconstruct its history.

The Galactic halo is in fact the natural repository of merger debris. According to theories of galaxy formation, halo stars must have formed in other small galaxies and were accreted a long time ago. Establishing the relative importance of accretion and mergers as a formation channel is one of the goals of Galactic archaeology. Another important goal is to reconstruct the family tree of our Galaxy, that is, to find the progenitors of our Galaxy, i.e. the galaxies that merged together and shaped the Milky Way. The characterization of their properties directly links to understanding the early Universe, since these are the leftovers or the local counterparts of the small galaxies that are now (barely) accessible with JWST.

How do we find merger debris? To this end, we need to have access to precise measurements of the motions of millions of stars, preferably in the halo of the Galaxy. Stars with the same origin move together through space, defining stellar streams as shown in Figure 2. This implies that one way to identify merger debris is to find groups of stars with similar velocities. Another useful space to identify merger debris is that of “dynamically conserved quantities”. These are physical quantities, such as angular momenta or total energy, which do not change (much) in time. Stars from the same accreted galaxy shared initially the same spatial location and velocity (they were clumped in phase-space), and this implies they also had very similar angular momenta and energy (kinetic and potential, i.e., that associated with their motions and with the gravitational potential of the Milky Way respectively). Since these quantities are, under certain conditions, conserved in time as stars orbit around the Milky Way, this implies that the initial clumping should be present also today. Models predict that if the Milky Way halo formed via mergers, there should be 500 stellar streams in the halo crossing our immediate galactic vicinity. These streams would originate in a handful of large galaxies, and very many small ones if the model predictions are correct. To test these requires samples of at least 5,000 stars, but preferably 50,000 stars, in the halo near the Sun with full velocity information, and with exquisite precision.

The Gaia space mission

Assembling such a dataset is not trivial particularly because access to the full velocity vector of a star implies the ability to measure both its motion along the line of sight (from spectra using the Doppler shift), as well as transversal to it, i.e., on the sky. The projected motion of a star on the sky is inversely proportional to its distance (we know from daily experience that the farther away an object is, the more difficult it is to establish if it moves), and typically very small. For example, a halo star in the vicinity of the Sun might move with a velocity of a few 100 km/s, but its projected motion can be as small as 1 milliarcsecond per year. This cannot be done from the ground because of atmospheric blurring, and hence requires a space mission. This is Gaia.

The Gaia satellite[4] was built by the European Space Agency, adopted in the year 2000 and launched in December 2013. Since then, it has been scanning the sky to measure very accurately the positions of all objects on the sky brighter than a given magnitude, their fluxes in different wavelengths, and for a subset of the brightest objects it also obtains their spectra. By visiting the same sky location multiple times, the satellite determines the variation in the position of an object in time, from which it is possible to infer both its distance as well as its proper motion. Information on the intrinsic properties of the stars, such as their temperature, gravity or even their metallicity,[3] can be derived from the measured fluxes. Similarly, whether a star’s brightness varies in time, which could be due to it being eclipsed by another star or a planet, or simply due to intrinsic oscillations which are useful to derive its internal structure. The stellar spectra obtained allow us to measure the velocity away or towards us as well as the chemical composition.

Thus far there have been four Gaia data releases (DR1, DR2, EDR3, DR3), with the latest one in June 2022. The vast amount of high-quality data, of measurements never done before, has triggered a revolution in Astronomy. A nice summary of the contents of Gaia DR3 is given in Figure 3.

New views on the Galaxy

The Gaia mission has enabled many discoveries, from our immediate cosmic neighborhood, i.e., the Solar System, where asteroids are present in families which have similar composition, to the realms of the Universe, where a hitherto unknown population of very distant double quasars has been uncovered.[5] In terms of Galactic evolution, some highlights include

1 – The discovery of the last big merger that the Milky Way experienced some 10-11 billion years ago;[6] this was a true milestone in Galactic history. Several other smaller accretion events have been uncovered thanks to Gaia data.

2 – The discovery that the Milky Way is still evolving dynamically,[7] implying that traditional (static) models to infer its mass distribution are fundamentally flawed. This is particularly important for inferences on its dark matter content.

3 – Many narrow streams leftover from accretion events, unexpectedly show a rather complex morphology.[8] The cause of this could be the presence of dark-matter clumps in the Milky Way halo, which would be expected for certain types of dark matter.

I will elaborate here on the first of these highlights and refer the interested reader to the reference list. Using Gaia DR2 data in combination with the APOGEE survey, we analyzed the motions, chemistry, age and spatial distribution of stars in a relatively large volume around the Sun. We discovered that almost half of the stars in the inner halo are part of a large kinematic structure whose average motion is in the opposite sense than the vast majority of stars in our Galaxy (including the Sun). These stars also have their own characteristic age distribution and they follow a separate sequence in chemical abundance space indicating that they formed elsewhere, in an accreted galaxy. We thus demonstrated that the inner halo is dominated by debris from a relatively large object (similar in size to the Magellanic Clouds). We named this long-gone galaxy Gaia-Enceladus. We estimated that at the time of accretion, roughly 10 billion years ago (for reference the Sun was born 4.5 billion years ago), Gaia-Enceladus had a mass of approximately 25% of that of the Milky Way at the time. As a result of the violence of the merger, the disk present at the time was shaken and heated dynamically, which explains the presence of large numbers of stars with very elongated trajectories but thick disk kinematics. The merger with Gaia-Enceladus not only led to the assembly of the inner halo, but also contributed to the formation of the Galactic thick disk component. Later work has also shown that significant star formation was triggered during the event. We may thus confidently state that the last big merger experienced by the Milky Way was a true milestone in Galactic history.

Debris from slightly more than a handful of small galaxies has also been uncovered in the Galactic halo near the Sun using the Gaia dataset through the application of clustering algorithms and statistical analyses. All in all, these findings are in line with the predictions from galaxy formation models.

The next steps in the field of Galactic Archaeology are the identification of merger debris beyond the Solar vicinity as well as the characterization of the accreted systems (their masses, star formation and chemical evolution history, time of accretion, etc.). This will be possible with new Gaia data releases in combination with data from ground-based spectroscopic surveys such as WEAVE and 4MOST which provide complementary information, for example, very detailed chemical abundance patterns that track the “DNA” of a star. This could be particularly useful for the identification of merger events that took place even earlier on in the history of our Galaxy.

Some general considerations

These wonderful discoveries were made possible by technological advances. For example, the ability to measure the tiny projected motions of stars on the sky effectively can be translated into a requirement to measure the diameter of a human hair at a distance of 1000 km. This puts strong constraints on basically all parts of the satellite, such as requiring an ultra-high stability of the platform over large periods of time and different temperatures, the ability to correct its position with a precise micropropulsion system (now also used for Euclid, the next large space mission of ESA) and very precise monitoring devices.

Gaia is undoubtedly a Big Data project. The ability to work with large datasets, to explore them efficiently and to identify the features one is after required the use of machine learning tools, which we typically validated on numerical simulations to understand their limitations. Clustering algorithms were used for the identification of clusters of stars with similar origin but it was as important to develop tools to establish whether a certain cluster is of high statistical significance.

For the analysis of the Gaia data, it was key to have software that could run fast, to perform quick data inspections through visualization tools without having to load all of the data. Postdocs in my research group developed a program named Vaex,[9] a tool that can plot a dataset with a billion entries using your standard laptop in less than 1 second! This tool, as well as 3D explorer (ipyvolume), are now being used also for non-astronomical applications, even for the restoration of Rembrandt’s Night watch painting.

The night sky has always been a source of inspiration and awe for humanity. Astronomy will undoubtedly continue to inspire and attract new generations to science. However, a particular point of concern is the light pollution on Earth as well as from satellites that are being launched in large constellations to provide internet access across the whole world. As the motto of the 100 years of the International Astronomical Union states, “We are all under one sky”. The sky belongs to humanity, to each one of us. Let’s make sure this continues to be the case and protect together the heritage of dark and quiet skies.


Acknowledgements: Special thanks to Emma Dodd for proofreading the text and to NWO, for financial support through the Spinoza Prize.



[1] Paraphrasing Prof. G. Gilmore

[2] There is approximately 6 times more dark matter than normal matter (i.e., that which interacts electromagnetically and emits light; this is also known as baryonic matter). The presence of large amounts of such non-luminous matter (hence its naming) is revealed by the motions of stars in galaxies and of galaxies in the Universe, which move much faster than expected from the mass associated to the luminous matter. It is generally believed dark matter is constituted by elementary particles yet to be detected on Earth. However, an alternative is that our description of the gravitational interactions is not fully correct, and a modification of Gravity would be necessary.

[3] During the Big Bang only hydrogen and helium, as well as a small amount of lithium, were produced. All other chemical elements have been synthesized in stars, and astronomers refer to their total abundance with respect to hydrogen as their metallicity. The abundance of chemical elements other than hydrogen has thus increased with time as subsequent generations of stars explode (or through stellar winds) and enrich the interstellar medium around them, from which new stars are born. This means that, for example, stars with very low metallicity were born in very pristine environments, possibly very early on in the history of the Universe.


[5] Gaia Collaboration: Galluccio and 446 colleagues 2022. Gaia Data Release 3: Reflectance spectra of Solar System small bodies. arXiv220612174G (A&A in press); Shen, Y. and 8 colleagues 2021. A hidden population of high-redshift double quasars unveiled by astrometry. Nature Astronomy 5, 569-574. doi:10.1038/s41550-021-01323-1

[6] Helmi, A., Babusiaux, C., Koppelman, H.H., Massari, D., Veljanoski, J., Brown, A.G.A. 2018. The merger that led to the formation of the Milky Way’s inner stellar halo and thick disk. Nature 563, 85-88. doi:10.1038/s41586-018-0625-x; Belokurov, V., Erkal, D., Evans, N.W., Koposov, S.E., Deason, A.J. 2018. Co-formation of the disc and the stellar halo. Monthly Notices of the Royal Astronomical Society 478, 611-619. doi:10.1093/mnras/sty982; Gallart, C. and 6 colleagues 2019. Uncovering the birth of the Milky Way through accurate stellar ages with Gaia. Nature Astronomy 3, 932–939. doi:10.1038/s41550-019-0829-5; Xiang, M., Rix, H.-W. 2022. A time-resolved picture of our Milky Way’s early formation history. Nature 603, 599-603. doi:10.1038/s41586-022-04496-5; Ruiz-Lara, T., Matsuno, T., Sofie Lövdal, S., Helmi, A., Dodd, E., Koppelman, H.H. 2022. Substructure in the stellar halo near the Sun. II. Characterisation of independent structures. arXiv220102405 (A&A in press)

[7] Antoja, T. and 12 colleagues 2018. A dynamically young and perturbed Milky Way disk. Nature 561, 360-362. doi:10.1038/s41586-018-0510-7

[8] Bonaca, A., Hogg, D.W., Price-Whelan, A.M., Conroy, C. 2019. The Spur and the Gap in GD-1: Dynamical Evidence for a Dark Substructure in the Milky Way Halo. The Astrophysical Journal 880. doi:10.3847/1538-4357/ab2873