Skip to article frontmatterSkip to article content
Fornax Demo Notebooks

Extract Multi-Wavelength Spectroscopy from Archival Data

Learning Goals

By the end of this tutorial, you will be able to:

• automatically load a catalog of sources

• search NASA and non-NASA resources for fully reduced spectra and load them using specutils

• store the spectra in a Pandas multiindex dataframe

• plot all the spectra of a given source

Introduction

Motivation

A user has a source (or a sample of sources) for which they want to obtain spectra covering ranges of wavelengths from the UV to the far-IR. The large amount of spectra available enables multi-wavelength spectroscopic studies, which is crucial to understand the physics of stars, galaxies, and AGN. However, gathering and analysing spectra is a difficult endeavor as the spectra are distributed over different archives and in addition they have different formats which complicates their handling. This notebook showcases a tool for the user to conveniently query the spectral archives and collect the spectra for a set of objects in a format that can be read in using common software such as the Python specutils package. For simplicity, we limit the tool to query already reduced and calibrated spectra. The notebook may focus on the COSMOS field for now, which has a large overlap of spectroscopic surveys such as with SDSS, DESI, Keck, HST, JWST, Spitzer, and Herschel. In addition, the tool enables the capability to search and ingest spectra from Euclid and SPHEREx in the feature. For this to work, the specutils functions may have to be update or a wrapper has to be implemented.

List of Spectroscopic Archives and Status

ArchiveSpectraDescriptionAccess pointStatus
IRSAKeckAbout 10,000 spectra on the COSMOS field from Hasinger et al. (2018)IRSA ArchiveImplemented with astroquery.ipac.irsa. (Table gives URLs to spectrum FITS files.) Note: only implemented for absolute calibrated spectra.
IRSASpitzer IRS~17,000 merged low-resolution IRS spectraIRS Enhanced ProductImplemented with astroquery.ipac.irsa. (Table gives URLs to spectrum IPAC tables.)
IRSAIRTF*Large library of stellar spectradoes astroquery.ipac.irsa work??
ESAHerschel*Some spectraastroquery.esa.hsaimplemented with astroquery
IRSAEuclidSpectra hosted at IRSA in FY25Using Q1 Spectra in the red grism (RGS) ingested in FY25, to be updated to DR1
IRSASPHERExSpectra/cubes will be hosted at IRSA, first release in FY25 -> preparation for ingestionWill use mock spectra with correct format for testing
MASTHST*Slitless spectra would need reduction and extraction. There are some reduced slit spectra from COS in the Hubble Archiveastroquery.mastImplemented using astroquery.mast
MASTJWST*Reduced slit MSA and Slit spectra that can be queriedastroquery.mastImplemented using astroquery.mast
SDSSSDSS opticalOptical spectra that are reducedSky Server or astroquery.sdss (preferred)Implemented using astroquery.sdss.
DESIDESI*Optical spectraDESI public data releaseImplemented with SPARCL library. Currently commented out because SPARCL library is incompatible with numpy > 2 which we need for other modules
BOSSBOSS*Optical spectraBOSS webpage (part of SDSS)Implemented with SPARCL library together with DESI
HEASARCNoneCould link to Chandra observations to check AGN occurrence.astroquery.heasarcMore thoughts on how to include scientifically.

The ones with an asterisk (*) are the challenging ones.

Input

• Coordinates for a single source or a sample on the COSMOS field

Output

• A Pandas data frame including the spectra from different facilities

• A plot comparing the different spectra extracted for each source

Runtime

As of 2025 July, this notebook takes about 18 minutes to run to completion on Fornax using a server with 16GB RAM/4 CPU’ and Environment: ‘Default Astrophysics’ (image).

Datasets that were considered but didn’t end up being used

IRTF:

Imports

This cell will install them if needed:

# Uncomment the next line to install dependencies if needed.
# %pip install -r requirements_spectra_collector.txt
import os
import sys

import astropy.units as u
from astropy.coordinates import SkyCoord
from astropy.table import Table
from astropy.utils.data import conf

sys.path.append('code_src/')
from data_structures_spec import MultiIndexDFObject
#from desi_functions import DESIBOSS_get_spec
from herschel_functions import Herschel_get_spec
from keck_functions import KeckDEIMOS_get_spec
from mast_functions import HST_get_spec, JWST_get_spec
from plot_functions import create_figures
from sample_selection import clean_sample
from sdss_functions import SDSS_get_spec
from spitzer_functions import SpitzerIRS_get_spec
from euclid_functions import euclid_get_spec

# The Euclid spectrum files are large and the time it takes to read
# them can exceed astropy's default timeout limit. Increase it.
conf.remote_timeout = 120

1. Define the sample

Here we will define the sample of galaxies. For now, we just enter some “random” coordinates to test the code.

coords = []
labels = []

coords.append(SkyCoord("{} {}".format("09 54 49.40", "+09 16 15.9"), unit=(u.hourangle, u.deg)))
labels.append("NGC3049")

coords.append(SkyCoord("{} {}".format("12 45 17.44 ", "27 07 31.8"), unit=(u.hourangle, u.deg)))
labels.append("NGC4670")

coords.append(SkyCoord("{} {}".format("14 01 19.92", "−33 04 10.7"), unit=(u.hourangle, u.deg)))
labels.append("Tol_89")

coords.append(SkyCoord(233.73856, 23.50321, unit=u.deg))
labels.append("Arp220")

coords.append(SkyCoord(150.091, 2.2745833, unit=u.deg))
labels.append("COSMOS1")

coords.append(SkyCoord(150.1024475, 2.2815559, unit=u.deg))
labels.append("COSMOS2")

coords.append(SkyCoord("{} {}".format("150.000", "+2.00"), unit=(u.deg, u.deg)))
labels.append("COSMOS3")

coords.append(SkyCoord("{} {}".format("+53.15508", "-27.80178"), unit=(u.deg, u.deg)))
labels.append("JADESGS-z7-01-QU")

coords.append(SkyCoord("{} {}".format("+53.15398", "-27.80095"), unit=(u.deg, u.deg)))
labels.append("TestJWST")

coords.append(SkyCoord("{} {}".format("268.48058743", "64.78064676"), unit=(u.deg, u.deg)))
labels.append("TestEuclid")

coords.append(SkyCoord("{} {}".format("+150.33622", "+55.89878"), unit=(u.deg, u.deg)))
labels.append("Twin Quasar")

sample_table = clean_sample(coords, labels, precision=2.0 * u.arcsecond, verbose=1)
after duplicates removal, sample size: 11

1.2 Write out your sample to disk

At this point you may wish to write out your sample to disk and reuse that in future work sessions, instead of creating it from scratch again. Note that we first check if the data directory exists and if not, we will create one.

For the format of the save file, we would suggest to choose from various formats that fully support astropy objects(eg., SkyCoord). One example that works is Enhanced Character-Separated Values or ‘ecsv’

if not os.path.exists("./data"):
    os.mkdir("./data")
sample_table.write('data/input_sample.ecsv', format='ascii.ecsv', overwrite=True)

1.3 Load the sample table from disk

Do only this step from this section when you have a previously generated sample table

sample_table = Table.read('data/input_sample.ecsv', format='ascii.ecsv')

1.4 Initialize data structure to hold the spectra

Here, we initialize the MultiIndex data structure that will hold the spectra.

df_spec = MultiIndexDFObject()

2. Find spectra for these targets in NASA and other ancillary catalogs

We search a curated list of NASA astrophysics archives. Because each archive is different, and in many cases each catalog is different, each function to access a catalog is necesarily specialized to the location and format of that particular catalog.

2.1 IRSA Archive

This archive includes spectra taken by

• Keck

• Spitzer/IRS

• Euclid/NISP

%%time
# Get Keck Spectra (COSMOS only)
df_spec_DEIMOS = KeckDEIMOS_get_spec(sample_table=sample_table, search_radius_arcsec=1)
df_spec.append(df_spec_DEIMOS)
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 1
no calibration is available
Number of entries found: 1
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
Number of entries found: 0
CPU times: user 337 ms, sys: 10 ms, total: 347 ms
Wall time: 26.1 s
%%time
# Get Spitzer IRS Spectra
df_spec_IRS = SpitzerIRS_get_spec(sample_table, search_radius_arcsec=1, COMBINESPEC=False)
df_spec.append(df_spec_IRS)
Processing source NGC3049
Number of entries found: 0
Processing source NGC4670
Number of entries found: 0
Processing source Tol_89
Number of entries found: 0
Processing source Arp220
Number of entries found: 8
More than 1 entry found - pick the closest
Processing source COSMOS1
Number of entries found: 0
Processing source COSMOS2
Number of entries found: 0
Processing source COSMOS3
Number of entries found: 0
Processing source JADESGS-z7-01-QU
Number of entries found: 0
Processing source TestJWST
Number of entries found: 0
Processing source TestEuclid
Number of entries found: 0
Processing source Twin Quasar
Number of entries found: 0
CPU times: user 420 ms, sys: 7.62 ms, total: 428 ms
Wall time: 14.8 s
%%time
# Get Euclid Spectra
df_spec_Euclid = euclid_get_spec(sample_table=sample_table, search_radius_arcsec=1)
df_spec.append(df_spec_Euclid)
Processing source NGC3049
No match found in Euclid MER catalog for NGC3049.
Processing source NGC4670
No match found in Euclid MER catalog for NGC4670.
Processing source Tol_89
No match found in Euclid MER catalog for Tol_89.
Processing source Arp220
No match found in Euclid MER catalog for Arp220.
Processing source COSMOS1
No match found in Euclid MER catalog for COSMOS1.
Processing source COSMOS2
No match found in Euclid MER catalog for COSMOS2.
Processing source COSMOS3
No match found in Euclid MER catalog for COSMOS3.
Processing source JADESGS-z7-01-QU
No match found in Euclid MER catalog for JADESGS-z7-01-QU.
Processing source TestJWST
No match found in Euclid MER catalog for TestJWST.
Processing source TestEuclid
Found Euclid object_id: 2684805874647806467
Processing source Twin Quasar
No match found in Euclid MER catalog for Twin Quasar.
CPU times: user 2.09 s, sys: 29.4 ms, total: 2.12 s
Wall time: 39.5 s

2.2 MAST Archive

This archive includes spectra taken by

• HST (including slit spectroscopy)

• JWST (including MSA and slit spectroscopy)

%%time
# Get Spectra for HST
df_spec_HST = HST_get_spec(
    sample_table,
    search_radius_arcsec=0.5,
    datadir="./data/",
    verbose=False,
    delete_downloaded_data=True
)
df_spec.append(df_spec_HST)
Processing source NGC3049
Number of search results: 9
Number of files to download: 4
Processing source NGC4670
Number of search results: 6
Number of files to download: 2
Processing source Tol_89
Number of search results: 9
Number of files to download: 3
Processing source Arp220
Number of search results: 0
Source Arp220 could not be found
Processing source COSMOS1
Number of search results: 4
Number of files to download: 0
Nothing to download for source COSMOS1.
Processing source COSMOS2
Number of search results: 3
Number of files to download: 0
Nothing to download for source COSMOS2.
Processing source COSMOS3
Number of search results: 1
Number of files to download: 0
Nothing to download for source COSMOS3.
Processing source JADESGS-z7-01-QU
Number of search results: 97
Number of files to download: 0
Nothing to download for source JADESGS-z7-01-QU.
Processing source TestJWST
Number of search results: 93
Number of files to download: 0
Nothing to download for source TestJWST.
Processing source TestEuclid
Number of search results: 0
Source TestEuclid could not be found
Processing source Twin Quasar
Number of search results: 6
Number of files to download: 2
CPU times: user 2.5 s, sys: 59.6 ms, total: 2.56 s
Wall time: 3min 38s
%%time
# Get Spectra for JWST
df_jwst = JWST_get_spec(
    sample_table,
    search_radius_arcsec=0.5,
    verbose=False,
    max_spectra_per_source = 5
)
df_spec.append(df_jwst)
Searching Spectra in the cloud... 
INFO: Using the S3 STScI public dataset [astroquery.mast.cloud]
Processing source NGC3049
Number of search results: 0
Source NGC3049 could not be found
Processing source NGC4670
Number of search results: 0
Source NGC4670 could not be found
Processing source Tol_89
Number of search results: 0
Source Tol_89 could not be found
Processing source Arp220
Number of search results: 0
Source Arp220 could not be found
Processing source COSMOS1
Number of search results: 0
Source COSMOS1 could not be found
Processing source COSMOS2
Number of search results: 2
Number of files found: 12
Limiting to first 5 spectra for source COSMOS2.
WARNING: NoResultsWarning: Unable to locate file jwst/public/jw02565/jw02565001001/jw02565001001_03101_00002_nrs2_x1d.fits. [astroquery.mast.cloud]
WARNING: NoResultsWarning: Unable to locate file jwst/public/jw02565/jw02565001001/jw02565001001_03101_00004_nrs2_x1d.fits. [astroquery.mast.cloud]
WARNING: NoResultsWarning: Unable to locate file jwst/public/jw02565/jw02565001001/jw02565001001_03101_00003_nrs2_x1d.fits. [astroquery.mast.cloud]
Skipping jw02565001001_03101_00002_nrs2_x1d.fits: not available in cloud.
Skipping jw02565001001_03101_00004_nrs2_x1d.fits: not available in cloud.
Skipping jw02565001001_03101_00003_nrs2_x1d.fits: not available in cloud.
Processing source COSMOS3
Number of search results: 0
Source COSMOS3 could not be found
Processing source JADESGS-z7-01-QU
Number of search results: 18
Number of files found: 612
Limiting to first 5 spectra for source JADESGS-z7-01-QU.
Processing source TestJWST
Number of search results: 6
Number of files found: 630
Limiting to first 5 spectra for source TestJWST.
Processing source TestEuclid
Number of search results: 0
Source TestEuclid could not be found
Processing source Twin Quasar
Number of search results: 0
Source Twin Quasar could not be found
done
Grouping Spectra... 
Grouping object COSMOS2
Grouping object JADESGS-z7-01-QU
Grouping object TestJWST
done
CPU times: user 10.6 s, sys: 831 ms, total: 11.4 s
Wall time: 6min 16s

2.3 ESA Archive

# Herschel PACS & SPIRE from ESA TAP using astroquery
# This search is fully functional, but is commented out because it takes
# ~4 hours to run to completion
herschel_radius = 1.1
herschel_download_directory = 'data/herschel'

# if not os.path.exists(herschel_download_directory):
#    os.makedirs(herschel_download_directory, exist_ok=True)
# df_spec_herschel =  Herschel_get_spec(sample_table, herschel_radius, herschel_download_directory, delete_downloaded_data=True)
# df_spec.append(df_spec_herschel)

2.4 SDSS Archive

%%time
# Get SDSS Spectra
df_spec_SDSS = SDSS_get_spec(sample_table, search_radius_arcsec=5, data_release=17)
df_spec.append(df_spec_SDSS)
Source Tol_89 could not be found
Source COSMOS1 could not be found
Source COSMOS2 could not be found
Source COSMOS3 could not be found
Source JADESGS-z7-01-QU could not be found
Source TestJWST could not be found
Source TestEuclid could not be found
CPU times: user 152 ms, sys: 14.7 ms, total: 167 ms
Wall time: 10.5 s

2.5 DESI Archive

This includes DESI spectra. Here, we use the SPARCL query. Note that this can also be used for SDSS searches, however, according to the SPARCL webpage, only up to DR16 is included. Therefore, we will not include SDSS DR16 here (this is treated in the SDSS search above).

The DESI search is currently commented out because SPARCL is not compatible with numpy > 2 which we require for the other modules to run.

#%%time
## Get DESI and BOSS spectra with SPARCL
#df_spec_DESIBOSS = DESIBOSS_get_spec(sample_table, search_radius_arcsec=5)
#df_spec.append(df_spec_DESIBOSS)

3. Make plots of luminosity as a function of time

We show flux in mJy as a function of time for all available bands for each object. show_nbr_figures controls how many plots are actually generated and returned to the screen. If you choose to save the plots with save_output, they will be put in the output directory and labelled by sample number.

### Plotting ####
create_figures(df_spec=df_spec,
               bin_factor=1,
               show_nbr_figures=10,
               save_output=False,
               )
<Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes><Figure size 900x600 with 1 Axes>

About this notebook

Acknowledgements