Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing default axes for map / reduce / filter #66

Open
freeman-lab opened this issue Jul 14, 2015 · 0 comments
Open

Consider changing default axes for map / reduce / filter #66

freeman-lab opened this issue Jul 14, 2015 · 0 comments
Assignees
Milestone

Comments

@freeman-lab
Copy link
Member

The current default for these operations on Spark arrays is axis=(0,), which may incur a swap to distribute along that axis (if it isn't already). The default could instead be axis=None which would mean apply over the distributed axes (whatever they are) and would never incur a swap.

Suggested by @shoyer, thanks!

This generally seems like a more friendly default, the only issue arises not with map but with reduce, when considering sequences of mixed operations. For example, in the following two cases where the map is a no-op,

data = ones((2, 3, 4), sc)
data.map(lambda x: x, axis=(0,)).reduce(add)
data.map(lambda x: x, axis=(0,1)).reduce(add)

if the default for reduce is over the partitioned axes, the answer will be different in the two cases, whereas if the default is over axis=(0,) it will be the same.

I can see an argument that these really should be the same with the default parameters, but curious to get other opinions. Another option is using different defaults for map/filter and reduce.

cc @andrewosh

@freeman-lab freeman-lab self-assigned this Jul 14, 2015
@freeman-lab freeman-lab added this to the 0.2.0 milestone Jul 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant