Managing OpenFOAM Physical Simulations with DVC, CML, and Studio (Part 2)

Time to celebrate our achievements and let automation take care of the details! We run OpenFOAM simulations in the cloud with our CI/CD tool CML and GitLab using AWS Computational resources. We then can easily visualize and share the simulation results with colleagues in Iterative Studio.

  • Mikhail Rozhkov
    Petr Zikán
  • May 10, 20236 min read
Hero Picture

In the previous post, we discussed how DVC simplifies physical simulation pipelines and data management. This post discusses how to run simulations in the cloud, run new experiments, and visualize simulation results with Iterative Studio and other tools.

In this post, you will learn how to:

  1. Manage computational resources on AWS and start and shut down EC2 instances for simulation experiments.

  2. Run new OpenFOAM simulations in a cloud using Iterative Studio and CML.

  3. Use Iterative Studio to view simulation results and DVC plots online.

This post is a result of collaboration between the and PlasmaSolve teams. PlasmaSolve was founded in 2016 by plasma physicists and software engineers to provide a platform for cutting-edge physics simulation services and research. The PlasmaSolve team strives to deliver top-notch solutions and well-designed physics simulations to speed up research and reduce development costs using various open-source and commercial simulation tools.

Run simulations in the cloud with GitLab and CML

For this part of the post, we follow the main branch in the demo repository. Please follow the README to prepare your environment and install dependencies.

OpenFOAM simulations can be computationally intensive, requiring access to high-performance computing resources or a cluster of computers to solve large or complex problems.

To run the demo simulation in AWS we may apply CML (Continuous Machine Learning). CML can start a new AWS EC2 instance to run a new simulation experiment and shut it down when it’s done.

The full configuration for the demo CI pipeline can be found in the .gitlab-ci.yml file.

The demo project shows an example of how to integrate CML into GitLab CI configuration. The pipeline has two stages: build and run. The build stage has a single job that builds a docker image based on the specified Dockerfile, pushes the image to Amazon Elastic Container Registry (ECR), and logs in to the registry. The run stage has three jobs: launch, run, and report. The launch job launches an EC2 instance on Amazon Web Services (AWS) and the run job runs a simulation on the instance. The report job generates a report on the simulation results. Visual representations of the CI pipeline and used AWS services are shown in the diagram below.

CML with Gitlab CI configuration CML with Gitlab CI configuration

Using AWS computational resources

When a workflow requires computational resources (such as GPUs), CML can automatically allocate cloud instances using  cml runner. You can spin up instances on AWS, Azure, GCP, or Kubernetes (see below). Alternatively, you can connect to any other computing provider or an on-premise (local) machine.

Below is an example of the GitLab CI launch job configuration that allocates AWS instances using cml runner command. Users may define the region, instance type, and storage size that are needed:

  stage: run
    - changes: [dvc.yaml, params.yaml, .gitlab-ci.yml]
  image: iterativeai/cml:0-dvc2-base1
  script: >
    cml runner launch  
    --cloud-region=$AWS_DEFAULT_REGION --cloud-type=m5.2xlarge  
    --cloud-hdd-size=32  --labels=cml

Setup CI jobs to run a simulation

To run a new simulation experiment using the cml runner we need to specify the cml tag in the run job and run dvc exp run command.

  stage: run
  tags: [cml]
    - changes: [params.yaml, .gitlab-ci.yml]
    # Run an experiment
    - dvc pull || echo "Pull failed"  # Pull outputs of previous simulation if any
    - dvc exp run -f
    - dvc push # Save results
    - rsync -r ./ /home/.cml/cache/run # Share results with 'report' job

Using dvc pull command helps to download the results of the previous experiments from the remote storage. Checking versions of previous results and DVC pipeline stage dependencies, DVC may skip running stages that do not need to be run and save a lot of time and computational resources. After the simulation completes, dvc push uploads the new results back to the remote storage.

After the run job completes, the report job prepares and publishes the CML report to the associated Git commit. For this, we need to build a file with all text & plots in Markdown format, and use the cml comment create command to publish this report and create a pull request.

  image: iterativeai/cml:0-dvc2-base1 # Python, DVC, & CML pre-installed
    # Create CML report
    - |
      cat <<EOF >
    - cml comment create --publish-native
    - cml pr create .

In some cases, these reports may help to collaborate with teammates using a Git workflow.

A report posted after the simulation runs in the pull request A report posted after the simulation runs in the Pull Request

Setup GitLab CI variables

To run simulations in AWS with GitLab CI & CML, it's recommended to use provider-managed policies/roles and then explicitly limit the permissions further if possible. Here is a set of common permissions required by CML.

In this demo we used the following CI variables in the project Settings → CI/CD → Variables:

  • AWS_SESSION_TOKEN - it is optional and depends on the AWS organization settings.
  • REPO_TOKEN - a personal access token with the api, read_repository and write_repository scopes. Find more details in CML docs on Personal Access Token

Examples of CI variables in GitLab Examples of CI variables in GitLab

Note: → AWS_SESSION_TOKEN is not required for most users. It’s specific to Iterative's sandbox account. → REPO_TOKEN - a personal access token with the api, read_repository and write_repository scopes. Find more details in CML docs on Personal Access Token.

Experimenting and visualization simulation results in Iterative Studio

Iterative Studio is a web application that you can access online or even host on-prem. Using the power of leading open-source tools DVCCML, and Git, enables you to seamlessly manage data, run and track experiments, and visualize and share results.

Run a new simulation

Using Iterative Studio we can run new simulation experiments in the Cloud and visualize results in Studio UI.

Example of running a new simulation experiment via Iterative Studio

Visualize simulation results

Iterative Studio helps to visualize simulation result images and DVC plots just after the simulation is complete. Studio allows one to plot images and metrics, and compare them with previous simulations.

Example of visualization of simulation results in Iterative Studio

Visualize the simulation outputs with ParaView

OpenFOAM includes several utilities for visualizing simulation results, including ParaView, which is a popular open-source visualization tool. Users can use these tools to generate plots, contour plots, and volume renderings of simulation results.

DVC can help to download the simulation outputs and visualize them locally. One could do a simple command to get all the data generated by the simulation:

$ dvc exp pull

Downloaded data can be visualized with third-party tools like ParaView.

Example for sonicFoam simulation results visualized in ParaView


This post details how Iterative tools help in physical and computational simulations. For this purpose, we created a demo project built with OpenFOAM. The demo shows how to set up DVC for simulation experiments and data management. CML is used in the GitLab CI pipeline to manage computational resources on AWS. Iterative Studio is then used as a UI to visualize simulation results and run new simulations in a few clicks.

Overall, DVC, CML, and Iterative Studio can help OpenFOAM users:

  1. Reduce the complexity of simulation pipelines and automate tasks such as running simulations, post-processing results, and generating reports.

  2. Manage and track the data and code associated with your OpenFOAM simulations, and make it easier to reproduce simulation results. Store simulation data on-premises or in the cloud using a variety of storage types, such as S3.

  3. Manage simulation experiments with simple YAML config files.

  4. Manage computational resources on AWS and start and shut down EC2 instances for simulation experiments.

  5. Iterative Studio provides a user-friendly interface for simulation results, visualization, and running new simulations quickly.

  6. Iterative Studio allows users to view and share simulation results and DVC plots online, without the need to download and visualize results locally.


Back to blog