Performance¶
This page covers both how the library is optimized and how performance is measured.
Performance model¶
The repository has two distinct performance layers:
kernel performance: analytical source-family implementations,
high-level orchestration performance: source/sensor preparation, path handling, batching, and field assembly.
Both matter. A fast kernel can still produce a slow user-facing getB if host-side preparation dominates.
High-level getB optimizations¶
The JIT-safe getB/getH/getJ/getM path includes:
generalized source preparation caches keyed by object cache tokens,
sensor preparation caches keyed by identity, path, pixel layout, and handedness,
cached orientation matrices on
BaseGeo,cached
Collectionflatten/source/sensor lists with dirty propagation,cached
TriangularMeshoriented faces and face geometry reuse,precomputed
CylinderSegmentface geometry inside the high-level JIT path,circle-heavy collection fast paths,
singleton-path caching for tiny observer batches.
Kernel-level profiling pipeline¶
Scripts:
Artifacts produced per source family:
JAX trace (
jax.profiler.trace)HLO dump (
compiler_ir(..., dialect="hlo"))device memory profile snapshot (
jax.profiler.save_device_memory_profile)
Fixed-observer-count JIT entrypoints¶
Hotspot wrappers live in core/kernels_extended.py and cache compilation by observer count.
Representative examples:
current_circle_bfield_jitcurrent_polyline_bfield_jittriangle_bfield_jitcurrent_trisheet_bfield_jitcurrent_tristrip_bfield_jittetrahedron_bfield_jitmagnet_trimesh_bfield_jit_faces_precompmagnet_cylinder_segment_bfield_jit
These are mainly for profiling and specialized high-throughput workloads. They are not the default user entry point.
What is measured¶
JAX compile time per source type
steady-state runtime (median over repeated runs)
max absolute parity error
peak process memory
HLO size and hash for inspection
HLO hash checks¶
Exact HLO hashes are useful for inspection and trend tracking, but they are intentionally treated as report-only in CI/nightly because unpinned JAX/XLA versions can change compiler output structure without changing correctness.
The hard gating remains on:
parity error,
compile/runtime thresholds,
memory thresholds,
benchmark thresholds,
tests and docs.
Threshold files¶
Typical workflow after a kernel change¶
Run
profile_kernels.pyand inspect compile/runtime/memory deltas.Run
profile_getB_jit.pyif the change can affect the high-level path.Compare parity outputs.
Update thresholds only when the change is intentional and justified.
Keep HLO baselines as observability aids, not as the only regression signal.