Skip to content

Concurrency & Multithreading

EntropyPortal uses a Hybrid Job-Based concurrency model. While high-level phases (Shadows, Probes) are sequenced explicitly, the massive workload of camera rendering is parallelized using a dynamic Frame Graph (a strictly transient WorkGraph).

The engine does not rely on a static dependency graph for its main render loop. Instead, the RenderService constructs a fresh WorkGraph (or reuses a cached one) every frame based on the active view configuration.

At the start of the camera phase, the main thread iterates the RenderWorld (data extracted from the ECS) to generate Jobs:

  • Auxiliary Jobs: Rendering faces for portals, mirrors, or security cameras.
  • Main Jobs: Rendering the user’s primary view.

The execution is driven purely by data dependencies, not by rigid phases.

  • No Dependencies: 2D Aux cameras and Cubemap Face 0s run immediately.
  • Data Dependencies: Cubemap faces 1-5 run as soon as their Face 0 completes.
  • View Dependencies: The Main Camera runs as soon as all Aux cameras are complete.
graph TD
    %% Independent Nodes (Run Immediately)
    Aux2D_A[2D Mirror: Cam A]
    Aux2D_B[2D Portal: Cam B]
    Cube_Face0[Cubemap: Face 0]

    %% Dependent Nodes (Run when ready)
    Cube_Face1[Cube Face 1]
    Cube_Face2[Cube Face 2]
    Cube_Face3[Cube Face 3]
    Cube_Face4[Cube Face 4]
    Cube_Face5[Cube Face 5]

    %% Intra-Camera Dependency
    Cube_Face0 --> Cube_Face1
    Cube_Face0 --> Cube_Face2
    Cube_Face0 --> Cube_Face3
    Cube_Face0 --> Cube_Face4
    Cube_Face0 --> Cube_Face5

    %% Inter-Camera Dependency (Main waits for all)
    Aux2D_A --> MainJob[Main Camera]
    Aux2D_B --> MainJob
    Cube_Face1 --> MainJob
    Cube_Face2 --> MainJob
    Cube_Face3 --> MainJob
    Cube_Face4 --> MainJob
    Cube_Face5 --> MainJob

Each Job corresponds to a GpuCommandBuffer.

  • Thread-Local Recording: Worker threads record draw commands independently.
  • Lock-Free Data: Jobs read from the Front Buffer of the RenderWorld, which is immutable during the render pass.

To decouple simulation from rendering without locks:

  1. Simulation Thread: Writes to Back Buffer (Extract).
  2. Render Thread: Reads from Front Buffer.
  3. Swap: Atomic pointer swap occurs only at the start of the frame.

GPU resources (Uniform Buffers, Command Allocators) are N-buffered (typically 3 frames).

  • Frame N: CPU writes to Index N.
  • Frame N-1: GPU reads from Index N-1.

We utilize the Swapchain’s back-pressure (vkAcquireNextImageKHR) as the primary throttle. This avoids explicit fence waits at the start of the frame, allowing the CPU to execute extract() and prepare() for the next frame while the GPU finishes the previous one.