Performance

This page covers both how the library is optimized and how performance is measured.

Performance model

The repository has two distinct performance layers:

  • kernel performance: analytical source-family implementations,

  • high-level orchestration performance: source/sensor preparation, path handling, batching, and field assembly.

Both matter. A fast kernel can still produce a slow user-facing getB if host-side preparation dominates.

High-level getB optimizations

The JIT-safe getB/getH/getJ/getM path includes:

  • generalized source preparation caches keyed by object cache tokens,

  • sensor preparation caches keyed by identity, path, pixel layout, and handedness,

  • cached orientation matrices on BaseGeo,

  • cached Collection flatten/source/sensor lists with dirty propagation,

  • cached TriangularMesh oriented faces and face geometry reuse,

  • precomputed CylinderSegment face geometry inside the high-level JIT path,

  • circle-heavy collection fast paths,

  • singleton-path caching for tiny observer batches.

Kernel-level profiling pipeline

Scripts:

Artifacts produced per source family:

  • JAX trace (jax.profiler.trace)

  • HLO dump (compiler_ir(..., dialect="hlo"))

  • device memory profile snapshot (jax.profiler.save_device_memory_profile)

Fixed-observer-count JIT entrypoints

Hotspot wrappers live in core/kernels_extended.py and cache compilation by observer count.

Representative examples:

  • current_circle_bfield_jit

  • current_polyline_bfield_jit

  • triangle_bfield_jit

  • current_trisheet_bfield_jit

  • current_tristrip_bfield_jit

  • tetrahedron_bfield_jit

  • magnet_trimesh_bfield_jit_faces_precomp

  • magnet_cylinder_segment_bfield_jit

These are mainly for profiling and specialized high-throughput workloads. They are not the default user entry point.

What is measured

  • JAX compile time per source type

  • steady-state runtime (median over repeated runs)

  • max absolute parity error

  • peak process memory

  • HLO size and hash for inspection

HLO hash checks

Exact HLO hashes are useful for inspection and trend tracking, but they are intentionally treated as report-only in CI/nightly because unpinned JAX/XLA versions can change compiler output structure without changing correctness.

The hard gating remains on:

  • parity error,

  • compile/runtime thresholds,

  • memory thresholds,

  • benchmark thresholds,

  • tests and docs.

Threshold files

Typical workflow after a kernel change

  1. Run profile_kernels.py and inspect compile/runtime/memory deltas.

  2. Run profile_getB_jit.py if the change can affect the high-level path.

  3. Compare parity outputs.

  4. Update thresholds only when the change is intentional and justified.

  5. Keep HLO baselines as observability aids, not as the only regression signal.