Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add support for querying whether an array library explicitly supports strides #764

Open
kgryte opened this issue Mar 21, 2024 · 5 comments
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Inspection Array API inspection.

Comments

@kgryte
Copy link
Contributor

kgryte commented Mar 21, 2024

Overview

The concept of strides has been raised several times since the beginning of the Consortium. This history was summarized in #641. In short, because several array libraries don't have an explicit concept of "strides" and strides can be considered an implementation detail, the Consortium has been intentional about avoiding any formal standardization of stride APIs.

Nevertheless, for a subset of array consuming libraries (such as SciPy and scikit-learn), being able to leverage (and manipulate) strides can confer performance advantages for the performance inclined. As a result, downstream libraries wanting to squeeze out additional performance have a higher likelihood of special casing the arrays of recognized libraries. This special casing is fine so long as you have a small number of supported array libraries; however, it presents a material disadvantage to those libraries also supporting strides which are not one of the privileged few.

Proposal

Now that the array API standard has an inspection API, this RFC seeks to propose adding support for querying whether a conforming array library has explicit support for array strides. Specifically, the following field should be added to the object returned by info.capabilities():

{
  ...
  "max rank": Optional[int],
+ "strides": bool
}

With this API in place, downstream libraries could query whether a conforming array library supports strides and special case accordingly.

Questions

  1. Does the proposed inspection API provide enough value to downstream libraries to justify inclusion? Notably, this proposal does not seek to standardize any keywords or stride-specific APIs (e.g., stride_tricks), and, thus, downstream libraries would still be left to check for specific API support. But, this stated, this may be enough of a hook to foster de facto community standardization, of which, among stride supporting libraries, there seems to be.

  2. As discussed in Adding order argument to asarray #571, DLPack allows access to strides; however, this API is a bit cumbersome for downstream libraries to use just to access strides. Nevertheless, maybe this is enough?

  3. A related question is, if we decide to move forward with this proposal, does it make sense to also add support for querying whether an array API supports an array "order" (e.g., "row-major" or "column-major")? This is another implementation detail which downstream array libraries hoping to adopt the standard have raised with particular concern for performance implications. Especially for CPU-bound dense array libraries which commonly interface with libraries written in other languages (C/C++/Fortran) and which assume a specific memory layout (row or column major), ensuring that created arrays have a particular layout can confer performance advantages.

Related

@kgryte kgryte added the API extension Adds new functions or objects to the API. label Mar 21, 2024
@rgommers
Copy link
Member

I'm not sure how this will help, given:

Does the proposed inspection API provide enough value to downstream libraries to justify inclusion? Notably, this proposal does not seek to standardize any keywords or stride-specific APIs [...]

If anything that has to do with strides has to be queried for with hasattr anyway, then doing has_strides = hasattr(x, 'strides') should be fine too.

The discussion in gh-641 points at defining APIs like sliding_window_view for the operations that people want to manipulate strides for. That seems like an important exercise, and is probably needed before deciding on this proposal. A good start would be to try using sliding_window_view for everything in SciPy and scikit-learn that manipulates strides, and seeing how far that gets us.

@leofang
Copy link
Contributor

leofang commented Mar 21, 2024

Does the proposed inspection API provide enough value to downstream libraries to justify inclusion? Notably, this proposal does not seek to standardize any keywords or stride-specific APIs [...]

If anything that has to do with strides has to be queried for with hasattr anyway, then doing has_strides = hasattr(x, 'strides') should be fine too.

From Python library perspective I am supportive for this RFC. hasattr(x, 'strides') assumes there's an attribute named strides, which may or may not be the case. For example, PyTorch makes it a function method without "s": torch.Tensor.stride(). While this RFC does not answer the question "if a library has strides, how do I query it?", sometimes a yes/no is enough (and if yes, we could then proceed to get the strides via .strides, from a DLPack capsule, etc). Though, this RFC doesn't answer this question either: "is the returned strides in unit of bytes or counts?", but DLPack has it defined.

@asmeurer
Copy link
Member

If DLPack already exposes strides then is it possible to do whatever stride tricks you would want to do with DLPack?

@leofang
Copy link
Contributor

leofang commented Mar 22, 2024

If we want to say "having __dlpack__ and __dlpack_device__ implemented is the proxy to check if strides exist," I am fine with it too, but we should document this. But then, it seems to be unintuitive for general users. Thus having a yes/no answer queryable to me is a good compromise.

@kgryte kgryte added RFC Request for comments. Feature requests and proposed changes. Needs Discussion Needs further discussion. topic: Inspection Array API inspection. labels Apr 4, 2024
@rgommers
Copy link
Member

rgommers commented Apr 4, 2024

hasattr(x, 'strides') assumes there's an attribute named strides, which may or may not be the case. For example, PyTorch makes it a function method without "s": torch.Tensor.stride(). While this RFC does not answer the question "if a library has strides, how do I query it?", sometimes a yes/no is enough

That's not really a good reason. A simple utility function like this will do:

def has_strides(x):
    return hasattr(x, 'strides') or hasattr(x, 'stride')

There are significant downsides to introducing strides as a concept in the standard, and zero upsides as far as I can tell - aside from avoiding having to carry a simple utility function like this. Also, this utility function is going to work for older versions of numpy and other array libraries (which everyone needs), while a new inspection API won't.

If we want to say "having __dlpack__ and __dlpack_device__ implemented is the proxy to check if strides exist,

That doesn't work I think. These methods are implementable by most libraries, and strides will then always be defined (not accessible from Python) even if the library does not offer non-contiguous arrays. Concrete example: JAX arrays have a __dlpack__ method, but x[::2] still copies. For the things folks want, it's the "can I do things with strided views efficiently without triggering copies".

Or maybe it's more like there are several different capabilities here that may require determining:

  1. Are non-contiguous ("strided") arrays supported
  2. Is it possible to do memory-efficient data read operations (a la numpy.lib.stride_tricks.as_strided)
  3. ... ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Inspection Array API inspection.
Projects
None yet
Development

No branches or pull requests

4 participants