Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: curio ffiselect: Isolate gpu calls in a subprocess #11994

Merged
merged 22 commits into from
May 23, 2024
Merged

Conversation

snadrus
Copy link
Contributor

@snadrus snadrus commented May 15, 2024

Related Issues

The GPU picker library "ffiselect" will run ffi calls on an assigned GPU using a subprocess. It does not respect deadlines, but will survive fatal behavior of the subprocess.

This PR also removes localWorker (tech debt) from curio and copies code from there and the sealer into Curiosrc near the usage.

Proposed Changes

Additional Info

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • If the PR affects users (e.g., new feature, bug fix, system requirements change), update the CHANGELOG.md and add details to the UNRELEASED section.
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@snadrus snadrus requested a review from magik6k May 15, 2024 03:38
@snadrus
Copy link
Contributor Author

snadrus commented May 16, 2024

@magik6k please review my approach.

  1. how do I avoid calling WindowPoSt in a loop?
  2. WInningPoSt:
  • Should I use GenerateSingleVanillaProof from the FFI Select wrapper?
  • Is there a better winningPoSt low-level call?
  1. Parallelism: where should I have it? I removed it from WindowPoSt.

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel quite strongly that we're passing those calls one layer of abstraction above where we should be passing them.

cmd/curio/ffi.go Outdated Show resolved Hide resolved
Copy link

github-actions bot commented May 17, 2024

All checks have completed

❌ Failed Test / Test (itest-curio) (pull_request)
❌ Failed Test / Test (itest-net) (pull_request)

@snadrus snadrus marked this pull request as ready for review May 17, 2024 22:49
@snadrus snadrus requested a review from magik6k May 17, 2024 22:49
Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter does appear to be correctly complainy about how we return errors from the ffiselect wrapper.

Some minor things to address, overall looks really good, can't wait to give this a try!

curiosrc/winning/winning_task.go Outdated Show resolved Hide resolved
curiosrc/winning/winning_task.go Outdated Show resolved Hide resolved
curiosrc/window/compute_do.go Outdated Show resolved Hide resolved
lib/ffiselect/ffiselect.go Show resolved Hide resolved
lib/ffiselect/ffiselect.go Show resolved Hide resolved
lib/ffiselect/ffiselect.go Outdated Show resolved Hide resolved
@snadrus
Copy link
Contributor Author

snadrus commented May 22, 2024

Your changes look good @magik6k. I also added an always-ran "test" to verify everything looks sharp. I realize that part of that could be a unit test instead, but I'm not sure if that's working well at the moment.

@magik6k magik6k changed the title Feat/gpu pick feat: curio ffiselect: Isolate gpu calls in a subprocess May 23, 2024
Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, and works on my machine. Flaky curio test is not from this PR, will fix in a followup

@magik6k magik6k merged commit c1f99c5 into master May 23, 2024
94 of 96 checks passed
@magik6k magik6k deleted the feat/gpu-pick branch May 23, 2024 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants