Consider changing default axes for map / reduce / filter #66

freeman-lab · 2015-07-14T21:31:40Z

The current default for these operations on Spark arrays is axis=(0,), which may incur a swap to distribute along that axis (if it isn't already). The default could instead be axis=None which would mean apply over the distributed axes (whatever they are) and would never incur a swap.

Suggested by @shoyer, thanks!

This generally seems like a more friendly default, the only issue arises not with map but with reduce, when considering sequences of mixed operations. For example, in the following two cases where the map is a no-op,

data = ones((2, 3, 4), sc)
data.map(lambda x: x, axis=(0,)).reduce(add)
data.map(lambda x: x, axis=(0,1)).reduce(add)

if the default for reduce is over the partitioned axes, the answer will be different in the two cases, whereas if the default is over axis=(0,) it will be the same.

I can see an argument that these really should be the same with the default parameters, but curious to get other opinions. Another option is using different defaults for map/filter and reduce.

cc @andrewosh

The text was updated successfully, but these errors were encountered:

freeman-lab added the task label Jul 14, 2015

freeman-lab self-assigned this Jul 14, 2015

freeman-lab added this to the 0.2.0 milestone Jul 14, 2015

freeman-lab added the discussion label Jul 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider changing default axes for map / reduce / filter #66

Consider changing default axes for map / reduce / filter #66

freeman-lab commented Jul 14, 2015

Consider changing default axes for map / reduce / filter #66

Consider changing default axes for map / reduce / filter #66

Comments

freeman-lab commented Jul 14, 2015