Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add count_nonzero for counting the number of "non-zero" values #794

Open
kgryte opened this issue Apr 18, 2024 · 7 comments · May be fixed by #803
Open

RFC: add count_nonzero for counting the number of "non-zero" values #794

kgryte opened this issue Apr 18, 2024 · 7 comments · May be fixed by #803
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Statistics Statistics.
Projects
Milestone

Comments

@kgryte
Copy link
Contributor

kgryte commented Apr 18, 2024

This RFC proposes a new addition to the array API specification for counting the number of "non-zero" (i.e., truthy) values in an array.

Overview

Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.

count_nonzero was originally identified in #187 as a potential standardization candidate and has usage within downstream libraries (e.g., sklearn, SciPy).

Prior art

Proposal

def count_nonzero(x: array, /, *, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> array
  • When axis is None, the function should count the number of non-zero elements along a flattened array.
  • The function should return an array having the default index data type.

Questions

  • In contrast to sum and other reductions, support for keepdims is less common among array libraries. Why this is the case is not clear. Are there any reasons why keepdims should not be standardized?
@kgryte kgryte added RFC Request for comments. Feature requests and proposed changes. API extension Adds new functions or objects to the API. topic: Statistics Statistics. Needs Discussion Needs further discussion. labels Apr 18, 2024
@kgryte kgryte added this to the v2024 milestone Apr 18, 2024
@rgommers
Copy link
Member

@asmeurer can you tell us if it's easy to work around a missing keepdims keyword in array-api-compat?

@rgommers
Copy link
Member

One other nice thing is that unlike nonzero, this function does not have a data-dependent output shape. So aside from performance, it can be supported by implementations that may not support nonzero.

@rgommers
Copy link
Member

The keepdims argument was added fairly late (2020) in numpy: numpy/numpy#15870. So it may have simply been overlooked by other libraries. Probably just a low-prio feature (also no usages in scipy at all).

@asmeurer
Copy link
Member

I think so. Isn't it just a matter of calling expand_dims? Maybe #760 would help.

@asmeurer
Copy link
Member

To reiterate what I said at the meeting today, count_nonzero is nice because the standard doesn't support calling sum() on a boolean array, so count_nonzero is the idiomatic way to get the number of True elements in a bool array.

@rgommers
Copy link
Member

Thanks! SGTM then to add count_nonzero. And add keepdims for design consistency with other reductions.

@kgryte kgryte added this to Stage 1 in Proposals Apr 30, 2024
@kgryte kgryte linked a pull request May 2, 2024 that will close this issue
@kgryte
Copy link
Contributor Author

kgryte commented May 2, 2024

PR is up: #803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Statistics Statistics.
Projects
Proposals
Stage 1
Development

Successfully merging a pull request may close this issue.

3 participants