ASV Benchmark suite for `PairwiseDistancesReductions`

Context

PairwiseDistancesReductions are Cython-based implementations of computational expensive patterns in many scikit-learn's algorithms.

In order to be able to maintain those on the longer-term, maintainers, and authors and reviewers of Pull Requests suggesting changes need to be able to easily and confidently assess performance regressions between revisions.

This independent asv benchmark suite is meant to help in this regards.

For more context, see:

scikit-learn/scikit-learn#24120
scikit-learn/scikit-learn#22587

Quick-start

This suite can be installed with:

git clone git@github.com:jjerphan/pairwise-distances-reductions-asv-suite.git 
cd pairwise-distances-reductions-asv-suite
pip install git+https://github.com/airspeed-velocity/asv

This suite can be run with:

# This might take a while (i.e several hours up to a day)
# if all combinations are benchmarked.
asv run

For more precised run, see asv commands' documentation.

Workflow plan

Needs

Have a feedback of performance improvement of regression in timely manner when needed for a scikit-learn Pull Request

In particular:

have a GitHub actions workflow which would be triggerable by a comment
specify revisions to compare (forwarded to asv continuous)
be able to indicate configuration to run benchmarks for, in particular regarding the following parameters' values:
- PairwiseDistancesReductions
- metric
- format of (X,Y) (in {sparse, dense}²)
have the full, verbose, sorted, asv textual report

Have an overview of performance with respect to theoretical ideal limit

In particular:

outputs graphs of hardware scalability
report estimate of sequential code proportion using Amdahl's law

Trace results overtime

Important notes

Benchmark are correctly and entirely reproducible, traceable and reportable when the following constraining requirements are met:

the same machine is used overtime: in practice, we can't expect CI providers to allocate the same machines over time, nor to dispatch to specifications-identical machines at a given time.
no other process that the benchmarks' are run on the machine: in practice, we can't expect CI providers to use process isolation
benchmarks definition aren't changed between revision: this requires not reformatting benchmarks' python code because asv hashes the content of the file to trace benchmark overtime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ASV Benchmark suite for `PairwiseDistancesReductions`

Context

Quick-start

Workflow plan

Needs

Have a feedback of performance improvement of regression in timely manner when needed for a scikit-learn Pull Request

Have an overview of performance with respect to theoretical ideal limit

Trace results overtime

Important notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

ASV Benchmark suite for PairwiseDistancesReductions

Context

Quick-start

Workflow plan

Needs

Have a feedback of performance improvement of regression in timely manner when needed for a scikit-learn Pull Request

Have an overview of performance with respect to theoretical ideal limit

Trace results overtime

Important notes

ASV Benchmark suite for `PairwiseDistancesReductions`