Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling materialization of lazy arrays #748

Open
hameerabbasi opened this issue Feb 13, 2024 · 4 comments
Open

Handling materialization of lazy arrays #748

hameerabbasi opened this issue Feb 13, 2024 · 4 comments
Labels
topic: Lazy/Graph Lazy and graph-based array implementations.

Comments

@hameerabbasi
Copy link
Contributor

Background

Some colleagues and me were doing some work on sparse when we stumbled onto a limitation of the current Array API Standard, and @kgryte was kind enough to point out that it might have some wider implications than just sparse, so it would be prudent to discuss it with other relevant parties within the community before settling on an API design to avoid fragmentation.

Problem Statement

There are two notable things missing from the Array API standard today, which sparse, and potentially Dask, JAX and other relevant libraries might also need.

  • Support for storage formats.
    • In Dask, this might be the array metadata, such as the type of the inner array.
    • In sparse, this would be the format of the sparse array (CRS, CCS, COO, ...).
  • Support for lazy arrays/materialization
    • sparse/JAX might use this to build up kernels before running a computation
    • Dask might use this for un-computed arrays stored as a task graph.

Potential solutions

Overload the Array.device attribute and the Array.to_device method.

One option is to overload the objects returned/accepted by these to contain a device + storage object. Something like the following:

class Storage:
    @property
    def device(self) -> Device:
        ...

    @property
    def format(self) -> Format:
        ...

    def __eq__(self, other: "Storage") -> bool:
        """ Compatible if combined? """

    def __ne__(self, other: "Storage") -> bool:
        """ Incompatible if combined? """

class Array:
    @property
    def device(self) -> Storage:
        ...

    def to_device(self, device: Storage, ...) -> "Array":
        ...

To materialize an array, one could use to_device(default_device()) (possible after #689 is merged).

Advantages

As far as I can see, it's compatible with how the Array API standard works today.

Disadvantages

We're mixing the concepts of an execution context and storage format, and in particular overloading operators in a rather weird way.

Introduce an Array.format attribute and Array.to_format method.

Advantages

We can get the API right, maybe even introduce xp.can_mix_formats(...).

Disadvantages

Would need to wait till the 2024 revision of the standard at least.

Tagging potentially interested parties:

@leofang
Copy link
Contributor

leofang commented Feb 13, 2024

I think this topic will have to be addressed in v2024, as it's too big to be squeezed in v2023 which we're trying very hard to wrap up 😅

@rgommers
Copy link
Member

A few quick comments:

  • Storage formats are going to be specific to individual libraries, so I don't see any reasonable way to standardize anything. Shouldn't be a problem, it's not forbidden to have them, have different array types for them, or add new constructors or extra keywords to existing APIs (please do make them keyword-only to prevent future compat issues)
  • Lazy arrays are fine by themselves, they're supported. There are previous discussions on this, see for example Eager functions in API appear in conflict with lazy implementation  #642, the topic: lazy/graph label, and https://data-apis.org/array-api/draft/design_topics/lazy_eager.html
  • Materialization via some function/method in the API that triggers compute would be the one thing that is possibly actionable. However, that is quite tricky. The page I linked above has a few things to say about it.

@rgommers rgommers added the topic: Lazy/Graph Lazy and graph-based array implementations. label Feb 13, 2024
@hameerabbasi hameerabbasi changed the title Handling of lazy computation and storage formats Handling materialization of lazy arrays Feb 15, 2024
@hameerabbasi
Copy link
Contributor Author

I think this topic will have to be addressed in v2024, as it's too big to be squeezed in v2023 which we're trying very hard to wrap up 😅

No pressure. 😉

Materialization via some function/method in the API that triggers compute would be the one thing that is possibly actionable. However, that is quite tricky. The page I linked above has a few things to say about it.

Thanks Ralf -- That'd be a big help indeed. Materializing an entire array as opposed to one element is something that should be a common API across libraries, IMHO, I changed the title to reflect that.

@kgryte
Copy link
Contributor

kgryte commented Mar 21, 2024

Cross linking #728 as it may be relevant to this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: Lazy/Graph Lazy and graph-based array implementations.
Projects
None yet
Development

No branches or pull requests

4 participants