Extract Multi-Wavelength Spectroscopy from Archival Data#

Learning Goals#

By the end of this tutorial, you will be able to:

• automatically load a catalog of sources

• search NASA and non-NASA resources for fully reduced spectra and load them using specutils

• store the spectra in a Pandas multiindex dataframe

• plot all the spectra of a given source

Introduction:#

Motivation#

A user has a source (or a sample of sources) for which they want to obtain spectra covering ranges of wavelengths from the UV to the far-IR. The large amount of spectra available enables multi-wavelength spectroscopic studies, which is crucial to understand the physics of stars, galaxies, and AGN. However, gathering and analysing spectra is a difficult endeavor as the spectra are distributed over different archives and in addition they have different formats which complicates their handling. This notebook showcases a tool for the user to conveniently query the spectral archives and collect the spectra for a set of objects in a format that can be read in using common software such as the Python specutils package. For simplicity, we limit the tool to query already reduced and calibrated spectra. The notebook may focus on the COSMOS field for now, which has a large overlap of spectroscopic surveys such as with SDSS, DESI, Keck, HST, JWST, Spitzer, and Herschel. In addition, the tool enables the capability to search and ingest spectra from Euclid and SPHEREx in the feature. For this to work, the specutils functions may have to be update or a wrapper has to be implemented.

List of Spectroscopic Archives and Status#

Archive

Spectra

Description

Access point

Status

IRSA

Keck

About 10,000 spectra on the COSMOS field from Hasinger et al. (2018)

IRSA Archive

Implemented with astroquery.ipac.irsa. (Table gives URLs to spectrum FITS files.) Note: only implemented for absolute calibrated spectra.

IRSA

Spitzer IRS

~17,000 merged low-resolution IRS spectra

IRS Enhanced Product

Implemented with astroquery.ipac.irsa. (Table gives URLs to spectrum IPAC tables.)

IRSA

IRTF*

Large library of stellar spectra

does astroquery.ipac.irsa work??

ESA

Herschel*

Some spectra

astroquery.esa.hsa

implemented with astroquery

IRSA

Euclid

Spectra hosted at IRSA in FY25

Using Q1 Spectra in the red grism (RGS) ingested in FY25, to be updated to DR1

IRSA

SPHEREx

Spectra/cubes will be hosted at IRSA, first release in FY25 -> preparation for ingestion

Will use mock spectra with correct format for testing

MAST

HST*

Slitless spectra would need reduction and extraction. There are some reduced slit spectra from COS in the Hubble Archive

astroquery.mast

Implemented using astroquery.mast

MAST

JWST*

Reduced slit MSA and Slit spectra that can be queried

astroquery.mast

Implemented using astroquery.mast

SDSS

SDSS optical

Optical spectra that are reduced

Sky Server or astroquery.sdss (preferred)

Implemented using astroquery.sdss.

DESI

DESI*

Optical spectra

DESI public data release

Implemented with SPARCL library. Currently commented out because SPARCL library is incompatible with numpy > 2 which we need for other modules

BOSS

BOSS*

Optical spectra

BOSS webpage (part of SDSS)

Implemented with SPARCL library together with DESI

HEASARC

None

Could link to Chandra observations to check AGN occurrence.

astroquery.heasarc

More thoughts on how to include scientifically.

The ones with an asterisk (*) are the challenging ones.

Input:#

• Coordinates for a single source or a sample on the COSMOS field

Output:#

• A Pandas data frame including the spectra from different facilities

• A plot comparing the different spectra extracted for each source

Non-standard Imports:#

• …

Runtime#

As of 2025 July, this notebook takes about 18 minutes to run to completion on Fornax using a server with 16GB RAM/4 CPU’ and Environment: ‘Default Astrophysics’ (image).

Authors:#

Andreas Faisst, Jessica Krick, Shoubaneh Hemmati, Troy Raen, Brigitta Sipőcz, David Shupe

Acknowledgements:#

AI: This notebook was created with assistance from OpenAI’s ChatGPT o4-mini-high model.

Datasets that were considered but didn’t end up being used:#

IRTF:#

  • https://irsa.ipac.caltech.edu/Missions/irtf.html

  • The IRTF is a 3.2 meter telescope, optimized for infrared observations, and located at the summit of Mauna Kea, Hawaiʻi.

  • large library of stellar spectra

  • Not included here because the data are not currently available in an easily accessible, searchable format

Imports#

This cell will install them if needed:

# Uncomment the next line to install dependencies if needed.
# %pip install -r requirements_spectra_collector.txt
import os
import sys

import astropy.units as u
from astropy.coordinates import SkyCoord
from astropy.table import Table

sys.path.append('code_src/')
from data_structures_spec import MultiIndexDFObject
#from desi_functions import DESIBOSS_get_spec
from herschel_functions import Herschel_get_spec
from keck_functions import KeckDEIMOS_get_spec
from mast_functions import HST_get_spec, JWST_get_spec
from plot_functions import create_figures
from sample_selection import clean_sample
from sdss_functions import SDSS_get_spec
from spitzer_functions import SpitzerIRS_get_spec
from euclid_functions import euclid_get_spec

1. Define the sample#

Here we will define the sample of galaxies. For now, we just enter some “random” coordinates to test the code.

coords = []
labels = []

coords.append(SkyCoord("{} {}".format("09 54 49.40", "+09 16 15.9"), unit=(u.hourangle, u.deg)))
labels.append("NGC3049")

coords.append(SkyCoord("{} {}".format("12 45 17.44 ", "27 07 31.8"), unit=(u.hourangle, u.deg)))
labels.append("NGC4670")

coords.append(SkyCoord("{} {}".format("14 01 19.92", "−33 04 10.7"), unit=(u.hourangle, u.deg)))
labels.append("Tol_89")

coords.append(SkyCoord(233.73856, 23.50321, unit=u.deg))
labels.append("Arp220")

coords.append(SkyCoord(150.091, 2.2745833, unit=u.deg))
labels.append("COSMOS1")

coords.append(SkyCoord(150.1024475, 2.2815559, unit=u.deg))
labels.append("COSMOS2")

coords.append(SkyCoord("{} {}".format("150.000", "+2.00"), unit=(u.deg, u.deg)))
labels.append("COSMOS3")

coords.append(SkyCoord("{} {}".format("+53.15508", "-27.80178"), unit=(u.deg, u.deg)))
labels.append("JADESGS-z7-01-QU")

coords.append(SkyCoord("{} {}".format("+53.15398", "-27.80095"), unit=(u.deg, u.deg)))
labels.append("TestJWST")

coords.append(SkyCoord("{} {}".format("268.48058743", "64.78064676"), unit=(u.deg, u.deg)))
labels.append("TestEuclid")

coords.append(SkyCoord("{} {}".format("+150.33622", "+55.89878"), unit=(u.deg, u.deg)))
labels.append("Twin Quasar")

sample_table = clean_sample(coords, labels, precision=2.0 * u.arcsecond, verbose=1)
after duplicates removal, sample size: 11

1.2 Write out your sample to disk#

At this point you may wish to write out your sample to disk and reuse that in future work sessions, instead of creating it from scratch again. Note that we first check if the data directory exists and if not, we will create one.

For the format of the save file, we would suggest to choose from various formats that fully support astropy objects(eg., SkyCoord). One example that works is Enhanced Character-Separated Values or ‘ecsv’

if not os.path.exists("./data"):
    os.mkdir("./data")
sample_table.write('data/input_sample.ecsv', format='ascii.ecsv', overwrite=True)

1.3 Load the sample table from disk#

Do only this step from this section when you have a previously generated sample table

sample_table = Table.read('data/input_sample.ecsv', format='ascii.ecsv')

1.4 Initialize data structure to hold the spectra#

Here, we initialize the MultiIndex data structure that will hold the spectra.

df_spec = MultiIndexDFObject()

2. Find spectra for these targets in NASA and other ancillary catalogs#

We search a curated list of NASA astrophysics archives. Because each archive is different, and in many cases each catalog is different, each function to access a catalog is necesarily specialized to the location and format of that particular catalog.

2.1 IRSA Archive#

This archive includes spectra taken by

• Keck

• Spitzer/IRS

• Euclid/NISP

%%time
# Get Keck Spectra (COSMOS only)
df_spec_DEIMOS = KeckDEIMOS_get_spec(sample_table=sample_table, search_radius_arcsec=1)
df_spec.append(df_spec_DEIMOS)
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 1
no calibration is available
Number of entries found: 1
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
CPU times: user 290 ms, sys: 11.4 ms, total: 302 ms
Wall time: 33.8 s
%%time
# Get Spitzer IRS Spectra
df_spec_IRS = SpitzerIRS_get_spec(sample_table, search_radius_arcsec=1, COMBINESPEC=False)
df_spec.append(df_spec_IRS)
Processing source NGC3049
Number of entries found: 0
Processing source NGC4670
Number of entries found: 0
Processing source Tol_89
Number of entries found: 0
Processing source Arp220
Number of entries found: 8
More than 1 entry found - pick the closest
Processing source COSMOS1
Number of entries found: 0
Processing source COSMOS2
Number of entries found: 0
Processing source COSMOS3
Number of entries found: 0
Processing source JADESGS-z7-01-QU
Number of entries found: 0
Processing source TestJWST
Number of entries found: 0
Processing source TestEuclid
Number of entries found: 0
Processing source Twin Quasar
Number of entries found: 0
CPU times: user 227 ms, sys: 9.2 ms, total: 236 ms
Wall time: 32.5 s
%%time
# Get Euclid Spectra
df_spec_Euclid = euclid_get_spec(sample_table=sample_table, search_radius_arcsec=1)
df_spec.append(df_spec_Euclid)
Processing source NGC3049
No match found in Euclid MER catalog for NGC3049.
Processing source NGC4670
No match found in Euclid MER catalog for NGC4670.
Processing source Tol_89
No match found in Euclid MER catalog for Tol_89.
Processing source Arp220
No match found in Euclid MER catalog for Arp220.
Processing source COSMOS1
No match found in Euclid MER catalog for COSMOS1.
Processing source COSMOS2
No match found in Euclid MER catalog for COSMOS2.
Processing source COSMOS3
No match found in Euclid MER catalog for COSMOS3.
Processing source JADESGS-z7-01-QU
No match found in Euclid MER catalog for JADESGS-z7-01-QU.
Processing source TestJWST
No match found in Euclid MER catalog for TestJWST.
Processing source TestEuclid
Found Euclid object_id: 2684805874647806467
Processing source Twin Quasar
No match found in Euclid MER catalog for Twin Quasar.
CPU times: user 4.12 s, sys: 364 ms, total: 4.48 s
Wall time: 39.9 s

2.2 MAST Archive#

This archive includes spectra taken by

• HST (including slit spectroscopy)

• JWST (including MSA and slit spectroscopy)

%%time
# Get Spectra for HST
df_spec_HST = HST_get_spec(
    sample_table,
    search_radius_arcsec=0.5,
    datadir="./data/",
    verbose=False,
    delete_downloaded_data=True
)
df_spec.append(df_spec_HST)
Processing source NGC3049
Number of search results: 8
Number of files to download: 4
Processing source NGC4670
Number of search results: 5
Number of files to download: 2
Processing source Tol_89
Number of search results: 8
Number of files to download: 3
Processing source Arp220
Number of search results: 0
Source Arp220 could not be found
Processing source COSMOS1
Number of search results: 4
Number of files to download: 0
Nothing to download for source COSMOS1.
Processing source COSMOS2
Number of search results: 3
Number of files to download: 0
Nothing to download for source COSMOS2.
Processing source COSMOS3
Number of search results: 1
Number of files to download: 0
Nothing to download for source COSMOS3.
Processing source JADESGS-z7-01-QU
Number of search results: 97
Number of files to download: 0
Nothing to download for source JADESGS-z7-01-QU.
Processing source TestJWST
Number of search results: 93
Number of files to download: 0
Nothing to download for source TestJWST.
Processing source TestEuclid
Number of search results: 0
Source TestEuclid could not be found
Processing source Twin Quasar
Number of search results: 5
Number of files to download: 2
CPU times: user 1.66 s, sys: 51.2 ms, total: 1.71 s
Wall time: 2min 55s
%%time
# Get Spectra for JWST
df_jwst = JWST_get_spec(
    sample_table,
    search_radius_arcsec=0.5,
    verbose=False,
    max_spectra_per_source = 5
)
df_spec.append(df_jwst)
Searching Spectra in the cloud... 
INFO: Using the S3 STScI public dataset [astroquery.mast.cloud]
Processing source NGC3049
Number of search results: 0
Source NGC3049 could not be found
Processing source NGC4670
Number of search results: 0
Source NGC4670 could not be found
Processing source Tol_89
Number of search results: 0
Source Tol_89 could not be found
Processing source Arp220
Number of search results: 0
Source Arp220 could not be found
Processing source COSMOS1
Number of search results: 0
Source COSMOS1 could not be found
Processing source COSMOS2
Number of search results: 2
Number of files found: 12
Limiting to first 5 spectra for source COSMOS2.
Skipping jw02565001001_03101_00003_nrs1_x1d.fits: not available in cloud.
Skipping jw02565001001_03101_00002_nrs1_x1d.fits: not available in cloud.
Processing source COSMOS3
Number of search results: 0
Source COSMOS3 could not be found
Processing source JADESGS-z7-01-QU
Number of search results: 112
Number of files found: 2304
Limiting to first 5 spectra for source JADESGS-z7-01-QU.
Processing source TestJWST
Number of search results: 102
Number of files found: 2358
Limiting to first 5 spectra for source TestJWST.
Processing source TestEuclid
Number of search results: 0
Source TestEuclid could not be found
Processing source Twin Quasar
Number of search results: 0
Source Twin Quasar could not be found
done
Grouping Spectra... 
Grouping object COSMOS2
Grouping object JADESGS-z7-01-QU
Grouping object TestJWST
done
CPU times: user 27.6 s, sys: 2.08 s, total: 29.6 s
Wall time: 6min 30s
WARNING: NoResultsWarning: Unable to locate file jwst/public/jw02565/jw02565001001/jw02565001001_03101_00003_nrs1_x1d.fits. [astroquery.mast.cloud]
WARNING: NoResultsWarning: Unable to locate file jwst/public/jw02565/jw02565001001/jw02565001001_03101_00002_nrs1_x1d.fits. [astroquery.mast.cloud]

2.3 ESA Archive#

# Herschel PACS & SPIRE from ESA TAP using astroquery
# This search is fully functional, but is commented out because it takes
# ~4 hours to run to completion
herschel_radius = 1.1
herschel_download_directory = 'data/herschel'

# if not os.path.exists(herschel_download_directory):
#    os.makedirs(herschel_download_directory, exist_ok=True)
# df_spec_herschel =  Herschel_get_spec(sample_table, herschel_radius, herschel_download_directory, delete_downloaded_data=True)
# df_spec.append(df_spec_herschel)

2.4 SDSS Archive#

%%time
# Get SDSS Spectra
df_spec_SDSS = SDSS_get_spec(sample_table, search_radius_arcsec=5, data_release=17)
df_spec.append(df_spec_SDSS)
Source Tol_89 returned no results.
Source COSMOS1 returned no results.
Source COSMOS2 returned no results.
Source COSMOS3 returned no results.
Source JADESGS-z7-01-QU returned no results.
Source TestJWST returned no results.
Source TestEuclid returned no results.
CPU times: user 148 ms, sys: 16.3 ms, total: 164 ms
Wall time: 5.04 s

2.5 DESI Archive#

This includes DESI spectra. Here, we use the SPARCL query. Note that this can also be used for SDSS searches, however, according to the SPARCL webpage, only up to DR16 is included. Therefore, we will not include SDSS DR16 here (this is treated in the SDSS search above).

The DESI search is currently commented out because SPARCL is not compatible with numpy > 2 which we require for the other modules to run.

#%%time
## Get DESI and BOSS spectra with SPARCL
#df_spec_DESIBOSS = DESIBOSS_get_spec(sample_table, search_radius_arcsec=5)
#df_spec.append(df_spec_DESIBOSS)

3. Make plots of luminosity as a function of time#

We show flux in mJy as a function of time for all available bands for each object. show_nbr_figures controls how many plots are actually generated and returned to the screen. If you choose to save the plots with save_output, they will be put in the output directory and labelled by sample number.

### Plotting ####
create_figures(df_spec=df_spec,
               bin_factor=1,
               show_nbr_figures=10,
               save_output=False,
               )
../_images/916a6259d8087cdff79ee8ce2f71ca8c1997f312eb88db11fe6ba554381ed35e.png ../_images/7c5600b3ac87fb101e6f5c4b2aaaf561e93a4c490103e4c3634887357a50200d.png ../_images/f9e530eca14e15b4d9c1a5f2d73bc0991a7b59deb15072f42bed9b3a288a2f73.png ../_images/850568e4ddf3c2f4858279a23e4d48b5a23878abb1346f913c483afa7fb2b58d.png ../_images/c54f28fcbba90f943934806041e21549f039a5797aabb8f2360883faa857d646.png ../_images/39393c1dd37000580367788e36e12728caa49ec06e05cf8a5bfb97cb6cb1333a.png ../_images/a74186b62d8a67cbbf9ca4ca9c75b2c33e33d6f357f1641b5807b1ed68a80806.png ../_images/2adb6fa8e4a34f8eb3208bca4d2e0bff81f0cdde7e9fe863c7cd12a72c40b4ca.png ../_images/d7f748b3bae453781dbb67535a69a150839b55209490e95f87313cb72ef9729d.png