Skip to article frontmatterSkip to article content
Tutorial

SWOT High Resolution data search and access

Authors
Affiliations
University of Maryland
NASA Goddard Space Flight Center
Colorado School of Mines
Colorado School of Mines

Overview

What you’ll do (quick preview)
  • Sign in with Earthdata Login
  • Search for SWOT_L2_HR_Raster_100m_D
  • Open results with earthaccess.open(...) and inspect a few variables
  • Visualize multiple SWOT rasters across time

Introduction to SWOT

In this tutorial, we will open and visualize data from NASA/CNES’s Surface Water and Ocean Topography (SWOT) mission in the cloud. In the next tutorial, we’ll compare SWOT elevations to NASA’s Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) over the Bach Ice Shelf (Antarctic Peninsula).

SWOT Credit: PO.DAAC cookbook

SWOT Credit: PO.DAAC cookbook

We use the SWOT High Resolution (HR) Level-2 Water Mask Raster Image Data Product, Version D (SWOT_L2_HR_Raster_D), which includes improved processing for ice-shelf environments. See PO.DAAC’s SWOT mission and data overview for product details, processing notes, and documentation.

Libraries needed to get started

%matplotlib widget

# For searching and accessing NASA data
import earthaccess

# For reading data, analysis and plotting
import xarray as xr
import hvplot.xarray

# For accessing the time dimension from filenames
from datetime import datetime
import re

# For plotting found datasets
import hvplot
import hvplot.xarray

import pprint  # For nice printing of python objects
Loading...

EarthData login

An Earthdata Login account is required to access (and in many cases stream) NASA data. If you don’t have one yet, register at https://urs.earthdata.nasa.gov. It’s free and quick to set up. We’ll use the earthaccess library to authenticate.

Login requires your Earthdata Login username and password. The login method will automatically search for these credentials as environment variables or in a .netrc file, and if those aren’t available it will prompt you to enter your username and password. We use the prompt strategy here.

Saving your credentials

A .netrc file is a text file located in our home directory that contains login information for remote machines. If you don’t have a .netrc file, login will create one for you if you use persist=True.

earthaccess.login(strategy='interactive', persist=True)

BUT make sure you do not commit your .netrc to your Github repo. This is easy to accidentally do via git add -A and would be a major security risk.

auth = earthaccess.login()

# Sanity check so you know that your credentals worked.
assert auth.authenticated, "Earthdata Login failed — please re-try."

Search for SWOT cloud-native collections

earthaccess leverages the Common Metadata Repository (CMR) API to search for collections and granules. Earthdata Search also uses the CMR API.

We can use the search_datasets method to search for SWOT collections by setting keyword="SWOT".

Advanced search options

The argument passed to keyword can be any string and can include wildcard characters ? or *. To see a full list of search parameters you can type earthaccess.search_datasets?. Using ? after a python object displays the docstring for that object.

A count of the number of data collections (Datasets) found is given.

query = earthaccess.search_datasets(
    keyword="SWOT",
    cloud_hosted = True,
    version = 'D'
)
print (f'{len(query)} datasets found.')
35 datasets found.

We can get a summary of each dataset, which includes links for where to find lengthier descriptions of the data. We look at the first five in the query here.

for collection in query[:5]:
    pprint.pprint(collection.summary(), sort_dicts=True, indent=4)
    print('')  # Add a space between collections for readability
{   'cloud-info': {   'Region': 'us-west-2',
                      'S3BucketAndObjectPrefixNames': [   'podaac-swot-ops-cumulus-protected/SWOT_L2_LR_SSH_D/',
                                                          'podaac-swot-ops-cumulus-public/SWOT_L2_LR_SSH_D/'],
                      'S3CredentialsAPIDocumentationURL': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentialsREADME',
                      'S3CredentialsAPIEndpoint': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentials'},
    'concept-id': 'C3233945000-POCLOUD',
    'file-type': "[{'FormatType': 'Native', 'Format': 'netCDF-4'}]",
    'get-data': [   'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3233945000-POCLOUD',
                    'https://search.earthdata.nasa.gov/search/granules?p=C3233945000-POCLOUD'],
    'short-name': 'SWOT_L2_LR_SSH_D',
    'version': 'D'}

{   'cloud-info': {   'Region': 'us-west-2',
                      'S3BucketAndObjectPrefixNames': [   'podaac-swot-ops-cumulus-protected/SWOT_L2_HR_RiverSP_D/',
                                                          'podaac-swot-ops-cumulus-public/SWOT_L2_HR_RiverSP_D/'],
                      'S3CredentialsAPIDocumentationURL': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentialsREADME',
                      'S3CredentialsAPIEndpoint': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentials'},
    'concept-id': 'C3233944997-POCLOUD',
    'file-type': "[{'FormatType': 'Native', 'Format': 'Shapefile'}]",
    'get-data': [   'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3233944997-POCLOUD',
                    'https://search.earthdata.nasa.gov/search/granules?p=C3233944997-POCLOUD'],
    'short-name': 'SWOT_L2_HR_RiverSP_D',
    'version': 'D'}

{   'cloud-info': {   'Region': 'us-west-2',
                      'S3BucketAndObjectPrefixNames': [   'podaac-swot-ops-cumulus-protected/SWOT_L2_HR_LakeAvg_D/',
                                                          'podaac-swot-ops-cumulus-public/SWOT_L2_HR_LakeAvg_D/'],
                      'S3CredentialsAPIDocumentationURL': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentialsREADME',
                      'S3CredentialsAPIEndpoint': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentials'},
    'concept-id': 'C3233944980-POCLOUD',
    'file-type': "[{'FormatType': 'Native', 'Format': 'Shapefile'}]",
    'get-data': [   'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3233944980-POCLOUD',
                    'https://search.earthdata.nasa.gov/search/granules?p=C3233944980-POCLOUD'],
    'short-name': 'SWOT_L2_HR_LakeAvg_D',
    'version': 'D'}

{   'cloud-info': {   'Region': 'us-west-2',
                      'S3BucketAndObjectPrefixNames': [   'podaac-swot-ops-cumulus-protected/SWOT_L2_HR_LakeSP_D/',
                                                          'podaac-swot-ops-cumulus-public/SWOT_L2_HR_LakeSP_D/'],
                      'S3CredentialsAPIDocumentationURL': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentialsREADME',
                      'S3CredentialsAPIEndpoint': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentials'},
    'concept-id': 'C3233944983-POCLOUD',
    'file-type': "[{'FormatType': 'Native', 'Format': 'Shapefile'}]",
    'get-data': [   'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3233944983-POCLOUD',
                    'https://search.earthdata.nasa.gov/search/granules?p=C3233944983-POCLOUD'],
    'short-name': 'SWOT_L2_HR_LakeSP_D',
    'version': 'D'}

{   'cloud-info': {   'Region': 'us-west-2',
                      'S3BucketAndObjectPrefixNames': [   'podaac-swot-ops-cumulus-protected/SWOT_L2_HR_LakeSP_obs_D/',
                                                          'podaac-swot-ops-cumulus-public/SWOT_L2_HR_LakeSP_obs_D/'],
                      'S3CredentialsAPIDocumentationURL': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentialsREADME',
                      'S3CredentialsAPIEndpoint': 'https://archive.swot.podaac.earthdata.nasa.gov/s3credentials'},
    'concept-id': 'C3233942286-POCLOUD',
    'file-type': "[{'FormatType': 'Native', 'Format': 'Shapefile'}]",
    'get-data': [   'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C2799438239-POCLOUD',
                    'https://search.earthdata.nasa.gov/search/granules?p=C2799438239-POCLOUD'],
    'short-name': 'SWOT_L2_HR_LakeSP_obs_D',
    'version': 'D'}

For each collection, summary returns a subset of fields from the collection metadata and Unified Metadata Model (UMM) entry.

  • concept-id is an unique identifier for the collection that is composed of a alphanumeric code and the provider-id for the DAAC.
  • file-type gives information about the file format of the collection files.
  • get-data is a collection of URLs that can be used to access data, dataset landing pages, and tools.
  • short-name is the name of the dataset that appears on the dataset set landing page. For SWOT, ShortNames are generally how different products are referred to.
  • version is the version of each collection.

For cloud-hosted data, there is additional information about the location of the S3 bucket that holds the data and where to get credentials to access the S3 buckets. In general, you don’t need to worry about this information because earthaccess handles S3 credentials for you. Nevertheless it may be useful for troubleshooting.

For the SWOT search results the end of the concept-id is POCLOUD which means this data is located in the PODAAC cloud.

For SWOT short-name refers to the following products. D at the end refers to version D, but you can replace that with 2.0 to get version C.

ShortNameProduct Description (with linked tutorials if available)
SWOT_L2_HR_Raster_DSWOT Level 2 Water Mask Raster Image
SWOT_L2_HR_Raster_100m_D100m spatial resolution
SWOT_L2_HR_Raster_250m_D250m spatial resolution
SWOT_L2_LR_SSH_DSWOT Level 2 KaRIn Low Rate Sea Surface Height
SWOT_L2_LR_SSH_BASIC_DContain a limited set of variables, aimed at the general user
SWOT_L2_LR_SSH_EXPERT_DContain all related variables, intended for expert users
SWOT_L2_LR_SSH_UNSMOOTH_DIncludes all related variables, on the finer resolution “native” grid, minimal smoothing applied
SWOT_L2_LR_SSH_WINDWAVE_DWind and wave height data
Some others
SWOT_L2_HR_PIXC_DSWOT Level 2 Water Mask Pixel Cloud
SWOT_L1B_HR_SLC_DSWOT Level 1B High-Rate Single-look Complex
SWOT_L2_HR_RiverSP_DSWOT Level 2 River Single-Pass Vector
SWOT_L2_HR_LakeSP_DSWOT Level 2 Lake Single-Pass Vector
SWOT_L2_NALT_GDR_2.0SWOT Level 2 Nadir Altimeter Geophysical Data Record with Waveforms

Hydrology products tutorial
Oceanography products tutorial

If you want to see just those short-names so you can paste it into the earthaccess data access below, you can use this method:

for collection in query[:5]:
    pprint.pprint(collection.summary()['short-name'], sort_dicts=True, indent=4)
'SWOT_L2_LR_SSH_D'
'SWOT_L2_HR_RiverSP_D'
'SWOT_L2_HR_LakeAvg_D'
'SWOT_L2_HR_LakeSP_D'
'SWOT_L2_HR_LakeSP_obs_D'

Search SWOT data using spatial and temporal filters

Once, you have identified the dataset you want to work with, you can use the search_data method to search a data set with spatial and temporal filters. Since we are using the SWOT HR Raster 100m product for this tutorial, we’ll search for those rasters over the Bach Ice Shelf in Antarctica, for May 1 and June 30, 2025.

Either concept-id or short-name can be used to search for granules from a particular dataset. If you use short-name you also need to set version. If you use concept-id, this is all that is required because concept-id is unique.

The temporal range is identified with standard date strings. Latitude-longitude corners of a bounding box are specified as lower left, upper right. Polygons and points, as well as shapefiles can also be specified.

This will display the number of granules that match our search.

# Open SWOT data
latmin,latmax = -72.5,-71.5
lonmin,lonmax = -73.4,-70.5
sbox = (lonmin, latmin, lonmax, latmax)

results = earthaccess.search_data(
    short_name="SWOT_L2_HR_Raster_100m_2.0",
    temporal=("2025-05-01", "2025-06-30"), 
    bounding_box=sbox
)

print(f'{len(results)} total')
4 total

We’ll get metadata for these 4 granules and display it. The rendered metadata shows a download link, granule size and two images of the data.

[display(r) for r in results]
Loading...

Open, load and display data stored on S3

Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the open method. This first step creates a Python file-like object that is used to load the data in the second step.

Authentication is required for this step. The auth object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials “behind-the-scenes”. These credentials expire after one hour so the auth object must be executed within that time window prior to these next steps.

rasters = earthaccess.open(results)
Loading...

After reading the data in, we can open one file at a time. In this example, data are loaded into an xarray.Dataset. Data could be read into numpy arrays or a pandas.Dataframe. However, each granule would have to be read using a package that reads HDF5 granules such as h5py. xarray does this all under-the-hood in a single line.

d1 = xr.open_dataset(rasters[0])
d1
Loading...

We can open just that one file, but if we want to work with a large timeseries, it is more likely that we want all 4 datasets in one xarray.Dataset. We can do this in on command called xarray.open_mfdataset, but in order to concatenate each dataset by time to add another dimension, we use the preprocess function built into xarray to add the time dimension. To execute preprocess to add a time dimension, we must first build a function that finds the time dimension from the file name and adds that extra dimension for each SWOT pass we have collected.

earthaccess.results.DataGranule.data_links(results[0], access='direct')
['s3://podaac-swot-ops-cumulus-protected/SWOT_L2_HR_Raster_2.0/SWOT_L2_HR_Raster_100m_UTM18C_N_x_x_x_032_115_011F_20250502T023949_20250502T023955_PIC2_01.nc']
# Preprocess helper to add a time coordinate from the filename
#    Looks for YYYYMMDDTHHMMSS anywhere in the source path
_TIME_RE = re.compile(r"(\d{8}T\d{6})")

def add_time_from_source(ds: xr.Dataset) -> xr.Dataset:
    src = str(ds.encoding.get("source", ""))  # xarray keeps this
    m = _TIME_RE.search(src)
    if m:
        ts = datetime.strptime(m.group(1), "%Y%m%dT%H%M%S")
        # attach as a proper dimension so open_mfdataset can concat
        ds = ds.expand_dims(time=[ts])
    else:
        # fallback: leave unmodified if no timestamp can be found
        pass
    return ds

Then we can run xarray.open_mfdataset with that preprocessing function included. This only lazy loads the data meaning we can do operations on the data and metadata but the data aren’t actually read into memory yet unless we need them. ds is only about 1 Gb right now, but if we ran ds.compute() to read all of the variables in, ds would be ~25 Gb and potentially crash our memory.

# Open as a multi-file dataset concatenated by time - 30s runtime
ds = xr.open_mfdataset(
    rasters,
    engine="h5netcdf",           # recommended for streamed HDF5/NetCDF via fsspec
    preprocess=add_time_from_source,
    combine="nested",            # adding time during preprocess
    concat_dim="time",
    decode_cf=True,
)
ds
Loading...

Notice that under dimensions, we now have time and it is showing we have 4 time steps, aside from the x and y dimensions.

ds.time.values
array(['2025-05-02T02:39:49.000000000', '2025-05-02T02:39:53.000000000', '2025-05-02T02:40:14.000000000', '2025-05-02T22:25:10.000000000'], dtype='datetime64[ns]')


Now we can plot all of these time steps with a really nice visualization package called hvplot. It give a little time widget on the right that allows you to scroll from one time step to the next. We are plotting the wse variable, but this can be switched out for any other variable name. You can move the image, zoom, wheel zoom, save the image, reset the image, and toggle the hover capabilities on the upper right of the image.

timeplot = ds.wse.hvplot.image(y='y', x='x')
timeplot.opts(width=700, height=500, colorbar=True)