# Performance This page covers both how the library is optimized and how performance is measured. ## Performance model The repository has two distinct performance layers: - kernel performance: analytical source-family implementations, - high-level orchestration performance: source/sensor preparation, path handling, batching, and field assembly. Both matter. A fast kernel can still produce a slow user-facing `getB` if host-side preparation dominates. ## High-level `getB` optimizations The JIT-safe `getB/getH/getJ/getM` path includes: - generalized source preparation caches keyed by object cache tokens, - sensor preparation caches keyed by identity, path, pixel layout, and handedness, - cached orientation matrices on `BaseGeo`, - cached `Collection` flatten/source/sensor lists with dirty propagation, - cached `TriangularMesh` oriented faces and face geometry reuse, - precomputed `CylinderSegment` face geometry inside the high-level JIT path, - circle-heavy collection fast paths, - singleton-path caching for tiny observer batches. ## Kernel-level profiling pipeline Scripts: - [`scripts/profile_kernels.py`](https://github.com/uwplasma/magpylib_jax/blob/main/scripts/profile_kernels.py) - [`scripts/check_profiling_thresholds.py`](https://github.com/uwplasma/magpylib_jax/blob/main/scripts/check_profiling_thresholds.py) - [`scripts/check_hlo_diffs.py`](https://github.com/uwplasma/magpylib_jax/blob/main/scripts/check_hlo_diffs.py) - [`scripts/profile_getB_jit.py`](https://github.com/uwplasma/magpylib_jax/blob/main/scripts/profile_getB_jit.py) - [`scripts/profile_wham_workload.py`](https://github.com/uwplasma/magpylib_jax/blob/main/scripts/profile_wham_workload.py) Artifacts produced per source family: - JAX trace (`jax.profiler.trace`) - HLO dump (`compiler_ir(..., dialect="hlo")`) - device memory profile snapshot (`jax.profiler.save_device_memory_profile`) ## Fixed-observer-count JIT entrypoints Hotspot wrappers live in [`core/kernels_extended.py`](https://github.com/uwplasma/magpylib_jax/blob/main/src/magpylib_jax/core/kernels_extended.py) and cache compilation by observer count. Representative examples: - `current_circle_bfield_jit` - `current_polyline_bfield_jit` - `triangle_bfield_jit` - `current_trisheet_bfield_jit` - `current_tristrip_bfield_jit` - `tetrahedron_bfield_jit` - `magnet_trimesh_bfield_jit_faces_precomp` - `magnet_cylinder_segment_bfield_jit` These are mainly for profiling and specialized high-throughput workloads. They are not the default user entry point. ## What is measured - JAX compile time per source type - steady-state runtime (median over repeated runs) - max absolute parity error - peak process memory - HLO size and hash for inspection ## HLO hash checks Exact HLO hashes are useful for inspection and trend tracking, but they are intentionally treated as report-only in CI/nightly because unpinned JAX/XLA versions can change compiler output structure without changing correctness. The hard gating remains on: - parity error, - compile/runtime thresholds, - memory thresholds, - benchmark thresholds, - tests and docs. ## Threshold files - [`benchmarks/thresholds.json`](https://github.com/uwplasma/magpylib_jax/blob/main/benchmarks/thresholds.json) - [`profiling/thresholds.json`](https://github.com/uwplasma/magpylib_jax/blob/main/profiling/thresholds.json) - [`profiling/thresholds_getB_jit.json`](https://github.com/uwplasma/magpylib_jax/blob/main/profiling/thresholds_getB_jit.json) ## Typical workflow after a kernel change 1. Run `profile_kernels.py` and inspect compile/runtime/memory deltas. 2. Run `profile_getB_jit.py` if the change can affect the high-level path. 3. Compare parity outputs. 4. Update thresholds only when the change is intentional and justified. 5. Keep HLO baselines as observability aids, not as the only regression signal.