Benchmarking Celeritas

Peter Heywood, Research Software Engineer

The University of Sheffield

2025-04-11

Celeritas

The Celeritas project implements HEP detector physics on GPU accelerator hardware with the ultimate goal of supporting the massive computational requirements of the HL-LHC upgrade.

Celeritas project Logo

  • NVIDIA GPUs via CUDA
  • AMD GPUs via HIP
  • 2 Geometry implementations
  • Standalone executables
  • Software library

More Information

“Accelerating detector simulations with Celeritas: performance improvements and new capabilities”

celeritas-project/regression

a suite of test problems in Celeritas to track whether

  • the code is able to run to completion without hitting an assertion,
  • how the code input options (and processed output) change over time,
  • how the kernel occupancy requirements change in response to growing code complexity
  • CPU and GPU runs
  • Standalone: celer-g4, celer-sim
  • Library: geant4
  • GPU power usage monitoring
  • Node-level benchmarking
  • ~22 simulation inputs
    • 7 geometries
    • simulation options (msc, field)
    • orange vs vecgeom

celeritas-project/regression reference plots

Regression per-node throughput using v0.5.1 on Perlmutter & Frontier

Regression per-node efficiency using v0.5.1 on Perlmutter & Frontier

Figure 1: Per-node (a) throughput and (b) efficiency for Celeritas v0.5.1 on Frontier & Perlmutter.
Generated using update-plots.py from commit d5b5c03.
Credit: celeritas-project/regression contributors

Hardware

Machine CPU per node GPU per node
Frontier 1x AMD “Optimized 3rd Gen EPYC” 8x AMD MI250x
Perlmutter 1x AMD EPYC 7763 4x NVIDIA A100
JADE 2.5 2x AMD EPYC 9534 8x AMD MI300x
Bede GH200 1x NVIDIA Grace 1x NVIDIA GH200 480GB
3090 (TUoS) 1x Intel i7-5930k 2x NVIDIA RTX 3090

JADE 2.5 / JADE@ARC

  • Joint Aceademic Data Science Endeavour 2.5
  • UK Tier-2 technology pilot resource
    • Funded by EPSRC
    • Hosted by the University of Oxford
  • 3 Lenovo ThinkSystem SR685a V3 Nodes
    • 2 AMD EPYC 9534 64-Core CPUs @ 280W
    • 8 AMD MI300X GPUs @ 750W
  • Currently in early accesss / beta phase

Bede GH200 Pilot

  • N8 CIR Bede Grace-Hopper Pilot
  • UK Tier-2 HPC resource
    • Originally funded by EPSRC
    • Hosted by Durham University
    • Extended by N8 partners for 1 year
  • 6x NVIDIA GH200 480GB nodes
    • 1 NVIDIA Grace 72-core ARM CPU @ 100W
    • 1 96GB Hopper GPU @ 900W
    • NVLink-C2C host-device interconnect

N8 CIR Bede Logo