Centre d'apprentissage

GPU Selection Guidelines for COMSOL Computing


GPU acceleration, supported exclusively on NVIDIA® hardware, is integrated into the COMSOL Multiphysics® software for:

  • Direct solvers
  • Deep-neural-network (DNN) surrogate model training
  • Time-explicit pressure acoustics analysis

Beginning with version 6.4, the NVIDIA CUDA® direct sparse solver (NVIDIA cuDSS) is available, providing substantial speedups across a wide range of applications. See the release highlights page GPU Acceleration in COMSOL Multiphysics® for more information.

This guide outlines the different types of NVIDIA® GPUs available for use with COMSOL Multiphysics® and provides guidelines to help you select the optimal GPU based on your intended application in the software.

When selecting a GPU, the key factors to consider include:

  • Video RAM (VRAM) capacity — the GPU memory capacity that determines the maximum model size the GPU can handle
  • Memory bandwidth — Throughput often limits performance, especially in memory-bound workloads:
    • High Bandwidth Memory (HBM) — extremely high-bandwidth stacked memory type used in data center GPUs that is beneficial for bandwidth-heavy workloads like those of cuDSS, which are often limited in terms of memory bandwidth
    • Graphics Double Data Rate (GDDR) memory — a high-throughput memory type used in workstation and consumer GPUs that provides strong (but generally lower than HBM) bandwidth at a lower cost than HBM
  • Error correction code (ECC) memory — correction of data corruption found on many workstation and data center GPUs that may affect performance
  • Enterprise drivers — provide enhanced stability, predictability, and certification for professional workloads, also referred to as production-branch drivers
  • Compute performance — how quickly the GPU can complete numerical operations:
    • Double-precision floating-point format (FP64) — important for high-accuracy scientific computations
    • Single-precision floating-point format (FP32) — important for workloads dominated by single-precision arithmetic, such as DNN surrogate model training and certain time-explicit simulations

Features of NVIDIA's GPU Categories

1. High-End Data Center GPUs (E.g., NVIDIA® H or B Series)

These high-end PCI Express (PCIe) and Server PCI Express Module (SXM) GPUs are designed for dense server environments that support fast GPU-to-GPU interconnects like NVLink® for multi-GPU configurations. Their VRAM capacity is the highest available, making them essential for holding very large models entirely in memory to avoid severe performance penalties. They use HBM3E with very high bandwidth (for example, 2 x 4 TB/s on B100-class GPUs), which is important for sparse solvers like cuDSS, where performance is often limited by memory access speed. Their memory type typically supports ECC and uses enterprise drivers for stability. Finally, high-end data center GPUs provide strong FP64 performance with a high FP64-to-FP32 ratio (often ~1:2), which benefits accuracy-sensitive workloads.

2. Professional Workstation GPUs (E.g., NVIDIA® RTX PRO™ Series)

NVIDIA® workstation GPUs are engineered to deliver enterprise-class performance and reliability directly to the desktop, balancing performance and cost for desktop environments. They offer large VRAM capacities and very good memory bandwidth through fast GDDR memory; for example, the RTX PRO™ 6000 Blackwell Workstation Edition has a memory bandwidth of 1.8 TB/s.

GDDR7 memory, specifically, delivers strong bandwidth for most workloads, but it is lower than that of HBM on data center GPUs. FP64 performance is also generally lower on workstation cards. NVIDIA RTX PRO™ GPUs offer enterprise drivers and ECC memory support, ensuring critical stability and precision for professional workflows.

3. Consumer (Gaming) GPUs (E.g., NVIDIA® GeForce RTX® Series)

Consumer GPUs offer strong performance at a low cost, particularly when used for gaming. However, there are some tradeoffs that impact their performance when they are used for scientific computing. Their VRAM varies and is generally smaller than that of professional or data center GPUs, which can limit the maximum model size. They provide good memory bandwidth using fast GDDR memory, but typically do not include ECC memory support. Note that GeForce RTX® 50 Series GPUs support ECC, limited to single-bit error correction, and do not include support for reporting or mitigation of uncorrectable errors.

The architecture of consumer GPUs is optimized primarily for gaming and rendering rather than FP64-heavy high-performance computing (HPC) workloads, and their FP64 performance is significantly lower than that of data center GPUs. Although NVIDIA GeForce RTX® series cards have both Game Ready® and Studio drivers available, they do not support the enterprise (production-branch) drivers used by workstation and data center GPUs, which offer additional stability and reliability features.

GPU Category Feature Overview

GPU Category Max VRAM Capacity Memory Bandwidth ECC Support Enterprise Drivers FP64 Performance Multi-GPU Support
High-End Data Center Best (exceeds workstation GPUs) Best Typically present Yes Best Yes, optimized for resource pooling
Professional Workstation Good (exceeds consumer GPUs) Very good Typically present Yes Fair Yes (not optimized)
Consumer (Gaming) Fair (up to ~32 GB) Good Typically absent No Fair Yes (not optimized)

GPU Selection Guidelines Application in COMSOL®

The optimal GPU choice depends heavily on your specific application. Note that the speedup also depends on the model size, with larger models often, but not always, seeing the greatest benefit.

For Use with cuDSS

  • Check whether your model truly needs a direct solver; robust iterative solvers with suitable preconditioners may be available and can often be faster.
  • Prioritize high memory bandwidth, sufficient VRAM, and strong FP64 performance when double-precision arithmetic is required in order to return a solution.
  • Some simulations may run with sufficient accuracy in FP32 (single precision) instead of FP64, yielding substantial speedups — especially on consumer or workstation GPUs, where FP32 is far faster than FP64.
  • Estimate memory requirements by solving once with a CPU direct solver (e.g., PARDISO or MUMPS) and then checking the reported peak memory usage.
  • cuDSS supports multi-GPU systems and memory sharing, allowing problems that exceed the VRAM of a single card to be solved in exchange for a performance penalty.
    • Sharing memory between GPUs with NVLink® has a minimal impact, while using PCIE has a moderate impact.
    • Using the Hybrid memory mode incurs a more significant impact due to sharing data between GPU VRAM and CPU RAM.
  • Data center GPUs (NVIDIA® H and B series) are typically the ideal choice and offer the best performance for large GPU-accelerated models. NVIDIA RTX PRO™ series GPUs also work well, though performance may be lower due to reduced memory bandwidth, FP64 capability, and/or VRAM compared with data center GPUs.

For Use with the Pressure Acoustics, Time Explicit Interface

The Pressure Acoustics, Time Explicit interface is based on the discontinuous Galerkin method (dG-FEM) and uses a time-explicit solver. These solvers behave differently than the implicit solvers used elsewhere in the COMSOL Multiphysics® software and have their own GPU-related considerations:

  • When using GPU acceleration for the Pressure Acoustics, Time Explicit interface, single-precision arithmetic (FP32) is selected by default, but the solver can also be run in double precision (FP64).
  • Multiple GPUs can be used (on a single machine or with a cluster). For pressure acoustics, multi-GPU acceleration always requires a floating network license (FNL). The use of NVLink® is not supported.

A professional workstation GPU (NVIDIA RTX PRO™ series) with sufficient VRAM is an excellent option for use with the Pressure Acoustics, Time Explicit interface. A data center GPU (H or B series) can serve as an upgrade choice, but it is not strictly necessary if you are running only this type of model. A consumer GPU (NVIDIA GeForce RTX® series) provides a cost-effective option that will also work as long as it has enough VRAM.

For Training of DNN Surrogate Models

COMSOL Multiphysics® trains DNN surrogate models on GPUs using industry-standard libraries. These computations rely entirely on single-precision (FP32) arithmetic.

  • DNN training in COMSOL Multiphysics® (currently for dense networks only) is typically compute-bound; even mid-range GPUs usually deliver significant speedups over CPUs.
  • High, sustained FP32 throughput is important for dense network training.
  • Data center GPUs deliver the highest performance and capacity for large, dense networks.
  • NVIDIA RTX PRO™ workstation GPUs yield excellent workstation performance.
  • Consumer NVIDIA GeForce RTX® GPUs are cost effective and fast for the tutorial-scale dense networks used in the Application Libraries.

GPU Computing System Requirements »

Regardless of the category, any GPU used must meet the minimum system requirements for the version of COMSOL Multiphysics® you are using.


Next Steps

Setting Up GPU-Accelerated Computing Within COMSOL Multiphysics® »

This guide details the system requirements and step-by-step installation process for enabling GPU-accelerated computing in COMSOL Multiphysics®.


NVIDIA, CUDA, Game Ready, GeForce, GeForce RTX, NVLink, and NVIDIA RTX PRO are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and/or other countries.


Envoyer des commentaires sur cette page ou contacter le support ici.