Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tordd() in bypass the lazy sorting after repartition #325

Open
boazmohar opened this issue Jun 2, 2016 · 3 comments
Open

tordd() in bypass the lazy sorting after repartition #325

boazmohar opened this issue Jun 2, 2016 · 3 comments

Comments

@boazmohar
Copy link
Contributor

An edge case after #319 and PR #320.
If I call tordd() after repartition() but before any other command that forces a task, I will get an unsorted rdd.
@jwittenbach, @freeman-lab What do you guys think about this? we could

  1. Add a warning in tordd if the sorted property in bolt is False
  2. Force a sort in tordd
    Any other suggestions?
@boazmohar
Copy link
Contributor Author

@freeman-lab @jwittenbach I suggest we add a few methods to base related to this issue:

  1. is_sorted() would return self.values._ordered in spark mode and True in local mode
  2. sort() would call self._rdd.sortByKey() and set self.values._ordered to True, will not be available in local mode
  3. tordd() would issue a warning if self.values._ordered is False.

What do you think?

@freeman-lab
Copy link
Member

Nice @boazmohar, great ideas!

I think I'd rather put a sort() method in bolt, as you say, it's really only relevant to the distributed case, which bolt is there to handle.

We might also want to call it order() or at least make it all consistent, e.g. order() and _ordered or sort() and _sorted. And instead of isordered() you can just add an @property that returns the value, check out how we do shape in bolt (see the example here)

But we can definitely add the warning mentioned in (3) to thunder now if you want to put in a PR for that.

@boazmohar
Copy link
Contributor Author

@freeman-lab Once we will know the bolt API for sorting, I will do a PR with the warning.
Thanks for the comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants