Overview¶
What you’ll do (quick preview)
Sign in with Earthdata Login
Search for a dataset, in this case MERRA-2
Open results with
earthaccess.open(...)and inspect a few variablesVisualize multiple rasters across time
Introduction to MERRA-2¶
In this tutorial, we will open and visualize data from NASA’s MERRA-2 datasets in the cloud. Keep in mind that we show this for one dataset, but the same approach can be used for all other NASA datasets in the cloud.
Libraries needed to get started¶
%matplotlib widget
# For searching and accessing NASA data
import earthaccess
# For reading data, analysis and plotting
import xarray as xr
import pandas as pd
# For accessing the time dimension from filenames
from datetime import datetime
import re
# For plotting found datasets
import ipywidgets as widgets
import matplotlib.pyplot as plt
import pprint # For nice printing of python objects
import warnings
warnings.filterwarnings("ignore")EarthData login¶
An Earthdata Login account is required to access (and in many cases stream) NASA data. If you don’t have one yet, register at https://earthaccess library to authenticate.
Login requires your Earthdata Login username and password. The login method will automatically search for these credentials as environment variables or in a .netrc file, and if those aren’t available it will prompt you to enter your username and password. We use the prompt strategy here.
auth = earthaccess.login()
# Sanity check so you know that your credentals worked.
assert auth.authenticated, "Earthdata Login failed — please re-try."Search for SWOT cloud-native collections¶
earthaccess leverages the Common Metadata Repository (CMR) API to search for collections and granules. Earthdata Search also uses the CMR API.
We can use the search_datasets method to search for MERRA-2 collections by setting keyword="MERRA-2".
Advanced search options
The argument passed to keyword can be any string and can include wildcard characters ? or *. To see a full list of search parameters you can type earthaccess.search_datasets?. Using ? after a python object displays the docstring for that object.
A count of the number of data collections (Datasets) found is given.
query = earthaccess.search_datasets(
keyword="MERRA-2",
cloud_hosted = True
)
print (f'{len(query)} datasets found.')219 datasets found.
We can get a summary of each dataset, which includes links for where to find lengthier descriptions of the data. We look at the first five in the query here.
for collection in query[10:15]:
print(collection['umm']['EntryTitle'])
pprint.pprint(collection.summary(), sort_dicts=True, indent=4)
print('') # Add a space between collections for readabilityMERRA-2 tavgM_2d_slv_Nx: 2d,Monthly mean,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree V5.12.4 (M2TMNXSLV) at GES DISC
{ 'cloud-info': { 'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': [ 's3://gesdisc-cumulus-prod-protected/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/'],
'S3CredentialsAPIDocumentationURL': 'https://data.gesdisc.earthdata.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials'},
'concept-id': 'C1276812859-GES_DISC',
'file-type': '',
'get-data': [ 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/',
'https://search.earthdata.nasa.gov/search/granules?p=C1276812859-GES_DISC'],
'short-name': 'M2TMNXSLV',
'version': '5.12.4'}
MERRA-2 tavg1_2d_adg_Nx: 2d,1-Hourly,Time-averaged,Single-Level,Assimilation,Aerosol Diagnostics (extended) 0.625 x 0.5 degree V5.12.4 (M2T1NXADG) at GES DISC
{ 'cloud-info': { 'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': [ 's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXADG.5.12.4/'],
'S3CredentialsAPIDocumentationURL': 'https://data.gesdisc.earthdata.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials'},
'concept-id': 'C1276812829-GES_DISC',
'file-type': '',
'get-data': [ 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXADG.5.12.4/',
'https://search.earthdata.nasa.gov/search/granules?p=C1276812829-GES_DISC'],
'short-name': 'M2T1NXADG',
'version': '5.12.4'}
MERRA-2 tavgM_2d_flx_Nx: 2d,Monthly mean,Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics 0.625 x 0.5 degree V5.12.4 (M2TMNXFLX) at GES DISC
{ 'cloud-info': { 'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': [ 's3://gesdisc-cumulus-prod-protected/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/'],
'S3CredentialsAPIDocumentationURL': 'https://data.gesdisc.earthdata.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials'},
'concept-id': 'C1276812868-GES_DISC',
'file-type': '',
'get-data': [ 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/',
'https://search.earthdata.nasa.gov/search/granules?p=C1276812868-GES_DISC'],
'short-name': 'M2TMNXFLX',
'version': '5.12.4'}
MERRA-2 inst3_3d_asm_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Assimilated Meteorological Fields 0.625 x 0.5 degree V5.12.4 (M2I3NVASM) at GES DISC
{ 'cloud-info': { 'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': [ 's3://gesdisc-cumulus-prod-protected/MERRA2/M2I3NVASM.5.12.4/'],
'S3CredentialsAPIDocumentationURL': 'https://data.gesdisc.earthdata.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials'},
'concept-id': 'C1276812900-GES_DISC',
'file-type': '',
'get-data': [ 'https://goldsmr5.gesdisc.eosdis.nasa.gov/data/MERRA2/M2I3NVASM.5.12.4/',
'https://search.earthdata.nasa.gov/search/granules?p=C1276812900-GES_DISC'],
'short-name': 'M2I3NVASM',
'version': '5.12.4'}
MERRA-2 inst6_3d_ana_Np: 3d,6-Hourly,Instantaneous,Pressure-Level,Analysis,Analyzed Meteorological Fields 0.625 x 0.5 degree V5.12.4 (M2I6NPANA) at GES DISC
{ 'cloud-info': { 'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': [ 's3://gesdisc-cumulus-prod-protected/MERRA2/M2I6NPANA.5.12.4/'],
'S3CredentialsAPIDocumentationURL': 'https://data.gesdisc.earthdata.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials'},
'concept-id': 'C1276812884-GES_DISC',
'file-type': '',
'get-data': [ 'https://goldsmr5.gesdisc.eosdis.nasa.gov/data/MERRA2/M2I6NPANA.5.12.4/',
'https://search.earthdata.nasa.gov/search/granules?p=C1276812884-GES_DISC'],
'short-name': 'M2I6NPANA',
'version': '5.12.4'}
For each collection, summary returns a subset of fields from the collection metadata and Unified Metadata Model (UMM) entry.
concept-idis an unique identifier for the collection that is composed of a alphanumeric code and the provider-id for the DAAC.file-typegives information about the file format of the collection files.get-datais a collection of URLs that can be used to access data, dataset landing pages, and tools.short-nameis the name of the dataset that appears on the dataset set landing page.ShortNamesare generally how different products are referred to.versionis the version of each collection.
For cloud-hosted data, there is additional information about the location of the S3 bucket that holds the data and where to get credentials to access the S3 buckets. In general, you don’t need to worry about this information because earthaccess handles S3 credentials for you. Nevertheless it may be useful for troubleshooting.
If you want to see just those short-names so you can paste it into the earthaccess data access below, you can use this method:
for collection in query[:5]:
pprint.pprint(collection.summary()['short-name'], sort_dicts=True, indent=4)'M2T1NXAER'
'M2T1NXSLV'
'M2I3NPASM'
'M2T1NXFLX'
'M2T1NXRAD'
Search MERRA-2 data using spatial and temporal filters¶
Once, you have identified the dataset you want to work with, you can use the search_data method to search a data set with spatial and temporal filters. Since we are using the M2I3NVASM (Assimilated Meteorological Fields) product for this tutorial, we’ll search for those rasters over the Bach Ice Shelf in Antarctica, for May 1 and May 2, 2025.
Either concept-id or short-name can be used to search for granules from a particular dataset. If you use short-name you also need to set version. If you use concept-id, this is all that is required because concept-id is unique.
The temporal range is identified with standard date strings. Latitude-longitude corners of a bounding box are specified as lower left, upper right. Polygons and points, as well as shapefiles can also be specified.
This will display the number of granules that match our search.
# Open MERRA-2 data
latmin,latmax = -72.5,-71.5
lonmin,lonmax = -73.4,-70.5
sbox = (lonmin, latmin, lonmax, latmax)
results = earthaccess.search_data(
short_name="M2I3NVASM",
temporal=("2025-05-01", "2025-05-02"),
bounding_box=sbox
)
print(f'{len(results)} total')2 total
We’ll get metadata for these 2 granules and display it. The rendered metadata shows a download link, granule size and two images of the data.
[display(r) for r in results][None, None]Open, load and display data stored on S3¶
Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the open method. This first step creates a Python file-like object that is used to load the data in the second step.
Authentication is required for this step. The auth object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials “behind-the-scenes”. These credentials expire after one hour so the auth object must be executed within that time window prior to these next steps.
rasters = earthaccess.open(results)After reading the data in, we can open one file at a time. In this example, data are loaded into an xarray.Dataset. Data could be read into numpy arrays or a pandas.Dataframe. However, each granule would have to be read using a package that reads HDF5 granules such as h5py. xarray does this all under-the-hood in a single line.
d1 = xr.open_datatree(rasters[0], engine="h5netcdf")We can open just that one file, but if we want to work with a large timeseries, it is more likely that we want all 4 datasets in one xarray.Dataset. We can do this in on command called xarray.open_mfdataset, but in order to concatenate each dataset by time to add another dimension, we use the preprocess function built into xarray to add the time dimension. To execute preprocess to add a time dimension, we must first build a function that finds the time dimension from the file name and adds that extra dimension for each SWOT pass we have collected.
earthaccess.results.DataGranule.data_links(results[0], access='direct')['s3://gesdisc-cumulus-prod-protected/MERRA2/M2I3NVASM.5.12.4/2025/05/MERRA2_400.inst3_3d_asm_Nv.20250501.nc4']# Preprocess helper to add a time coordinate from the filename
# Looks for YYYYMMDDTHHMMSS anywhere in the source path
_TIME_RE = re.compile(r"(\d{8}T\d{6})")
def add_time_from_source(ds: xr.Dataset) -> xr.Dataset:
src = str(ds.encoding.get("source", "")) # xarray keeps this
m = _TIME_RE.search(src)
if m:
ts = datetime.strptime(m.group(1), "%Y%m%dT%H%M%S")
# attach as a proper dimension so open_mfdataset can concat
ds = ds.expand_dims(time=[ts])
else:
# fallback: leave unmodified if no timestamp can be found
pass
return dsThen we can run xarray.open_mfdataset with that preprocessing function included. This only lazy loads the data meaning we can do operations on the data and metadata but the data aren’t actually read into memory yet unless we need them. ds is only about 1 Gb right now, but if we ran ds.compute() to read all of the variables in, ds would be ~25 Gb and potentially crash our memory.
# Open as a multi-file dataset concatenated by time - 30s runtime
ds = xr.open_mfdataset(
rasters,
engine="h5netcdf", # recommended for streamed HDF5/NetCDF via fsspec
combine="nested",
concat_dim="time"
)
dsNotice that under dimensions, we now have time and it is showing we have 4 time steps, aside from the x and y dimensions.
ds.time.valuesarray(['2025-05-01T00:00:00.000000000', '2025-05-01T03:00:00.000000000',
'2025-05-01T06:00:00.000000000', '2025-05-01T09:00:00.000000000',
'2025-05-01T12:00:00.000000000', '2025-05-01T15:00:00.000000000',
'2025-05-01T18:00:00.000000000', '2025-05-01T21:00:00.000000000',
'2025-05-02T00:00:00.000000000', '2025-05-02T03:00:00.000000000',
'2025-05-02T06:00:00.000000000', '2025-05-02T09:00:00.000000000',
'2025-05-02T12:00:00.000000000', '2025-05-02T15:00:00.000000000',
'2025-05-02T18:00:00.000000000', '2025-05-02T21:00:00.000000000'],
dtype='datetime64[ns]')
Now we can plot all of these time steps with a slider to scroll through. We’ll use ipywidgets to add an interactive time slider. This requires
three steps: computing the data, defining a plot function, and wiring
them together.
First, we select a single pressure level and load the data into memory so the slider will respond instantly:
T_computed = ds.T.sel(lev=500, method="nearest").compute()
times = pd.to_datetime(T_computed.time.values).to_pydatetime()Next, we define a function that plots one time step. time_idx is a
number (0, 1, 2…) that the slider will control:
def plot_T(time_idx):
fig, ax = plt.subplots(figsize=(10, 5))
T_computed.isel(time=time_idx).plot.pcolormesh(ax=ax, cmap="RdBu_r")
ax.set_title(str(times[time_idx])[:10])
plt.show()Finally, widgets.interact connects the function to a slider.
SelectionSlider lets us display human-readable dates while still
passing the index number to plot_T:
widgets.interact(
plot_T,
time_idx=widgets.SelectionSlider(
options=[(t.strftime("%Y-%m-%d %HH"), i) for i, t in enumerate(times)],
description="Time",
style={"description_width": "initial"}
)
)<function __main__.plot_T(time_idx)>Summary¶
In this tutorial, you used earthaccess to find, access, and visualize
NASA MERRA-2 data directly from the cloud without downloading any files.
Here’s what you did:
Authenticated with NASA Earthdata Login using
earthaccess.login()Searched for cloud-hosted MERRA-2 collections and granules using keywords, spatial bounding boxes, and date ranges
Streamed data directly from S3 into memory using
earthaccess.open()Opened multi-file datasets with
xarray.open_mfdataset()for efficient lazy loadingVisualized temperature at 500 hPa interactively across time using
ipywidgetsandmatplotlib
The same workflow — search, open, load, visualize — applies to any of the thousands of NASA datasets available on Earthdata Cloud.