2. Global Workflow Components

The global-workflow is a combination of several components working together to prepare, analyze, produce, and post-process forecast data.

The major components of the system are:

  • Workflow

  • Pre-processing

  • Analysis

  • Forecast

  • Post-processing

  • Verification

The Global Workflow repository contains the workflow and script layers. External components will be checked out as git submodules. All of the submodules of the system reside in their respective repositories on GitHub.

2.1. Component repositories

Components included as submodules:

Note

When running the system in forecast-only mode the Data Assimilation components are not needed and are hence not built.

2.2. External dependencies

2.2.1. Libraries

All the libraries that are needed to run the end to end Global Workflow are built using a package manager. These are served via spack-stack. These libraries are already available on supported NOAA HPC platforms.

Find information on official installations of spack-stack here:

https://github.com/JCSDA/spack-stack/wiki/Porting-spack-stack-to-a-new-system

2.2.2. Observation data (OBSPROC/prep)

2.2.2.1. Data

Observation data, also known as dump data, is prepared in production and then archived in a global dump archive (GDA) for use by users when running cycled experiments. The GDA (identified as $DMPDIR in the workflow) is available on supported platforms and the workflow system knows where to find the data.

  • Hera: /scratch1/NCEPDEV/global/glopara/dump

  • Orion/Hercules: /work/noaa/rstprod/dump

  • Jet: /mnt/lfs4/HFIP/hfv3gfs/glopara/dump

  • WCOSS2: /lfs/h2/emc/global/noscrub/emc.global/dump

  • S4: /data/prod/glopara/dump

2.2.2.1.1. Global Dump Archive Structure

The global dump archive (GDA) mimics the structure of its production source:

  • GDAS/GFS: DMPDIR/gdas[gfs].PDY/CC/atmos/FILES

  • RTOFS: DMPDIR/rtofs.PDY/FILES

The GDA also contains special versions of some datasets and experimental data that is being evaluated ahead of implementation into production. The following subfolder suffixes exist:

SUFFIX

WHAT

nr

Non-restricted versions of restricted files in production. Produced in production. Restriced data is fully stripped from files. These files remain as is.

ur

Un-restricted versions of restricted files in production. Produced and archived on a 48hrs delay. Some restricted datasets are unrestricted. Data amounts: restricted > un-restricted > non-restricted Limited availability. Discontinued producing mid-2023.

x

Experimental global datasets being evaluated for production. Dates and types vary depending on upcoming global upgrades.

y

Similar to “x” but only used when there is a duplicate experimental file in the x subfolder with the same name. These files will be different from both the production versions (if that exists already) and the x versions. This suffix is rarely used.

p

Pre-production copy of full dump dataset, as produced by NCO during final 30-day parallel ahead of implementation. Not always archived.

2.2.2.2. Data processing

Upstream of the global-workflow is the collection, quality control, and packaging of observed weather. The handling of that data is done by the OBSPROC group codes and scripts. The global-workflow uses two packages from OBSPROC to run its prep step to prepare observation (dump) data for use by the analysis system:

  1. https://github.com/NOAA-EMC/obsproc

  2. https://github.com/NOAA-EMC/prepobs

Package versions and locations on supported platforms are set in the global-workflow system configs, modulefiles, and version files.