Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBADF from Lwt_process.with_* #956

Open
mroch opened this issue Jun 24, 2022 · 4 comments
Open

EBADF from Lwt_process.with_* #956

mroch opened this issue Jun 24, 2022 · 4 comments

Comments

@mroch
Copy link

mroch commented Jun 24, 2022

I've received reports of Unix.Unix_error(Unix.EBADF, "close", "") exceptions that stem from this process#close:

process#close >>= fun _ ->

Unfortunately I don't have a repro yet to understand what the process we're calling is doing, but we're calling Lwt_process.with_process_full here:

https://github.com/facebook/flow/blob/38b41863de17211ae70c8bbc57854fbe7fed1ce3/src/common/lwt/lwtSysUtils.ml#L54

According to close(2), EBADF in this case means that the file descriptor that Lwt created internally is invalid. Would it make sense to catch and suppress this exception?

Whatever the underlying exception is is being swallowed. Even if close's exception is ignored, it'd be nice to rethrow the original exception with its original backtrace if possible.

@raphael-proust
Copy link
Collaborator

Hi @mroch and thanks for the report,

Turns out @MisterDA quickly fixed a bug that's probably the root cause of the issue you encountered. Is the issue still ongoing with the #957 merged and now released?

@mroch
Copy link
Author

mroch commented Jun 29, 2022

i don't think it was the root cause because (according to that PR) it broke in 9de01c9 which was in 5.6.0, but we're using 5.5.0.

i'm going to use a workaround in Flow to see if I can figure out what the underlying issue that causes the invalid FD is, so that we can hopefully find a repro/write a test.

@raphael-proust
Copy link
Collaborator

Ok, thanks for the info. I'll try to have a look soon to check whether we could forward the original exception and what other issue might be the root cause.

facebook-github-bot pushed a commit to facebook/flow that referenced this issue Jun 29, 2022
Summary:
we're seeing exceptions like this:

```
Failed to initialize watchman: Unix.Unix_error(Unix.EBADF, "close", "")
Raised by primitive operation at Lwt_unix.self_result in file "src/unix/lwt_unix.cppo.ml", line 237, characters 14-31
Re-raised at Watchman.get_sockname.(fun) in file "flow/src/hack_forked/watchman/watchman.ml", line 381, characters 2-934
Re-raised at FileWatcher.WatchmanFileWatcher.watchman#wait_for_init.go_exn.(fun) in file "flow/src/monitor/fileWatcher.ml", line 534, characters 10-1023
```

`Watchman.get_sockname` calls `LwtSysUtils.exec` which calls `Lwt_process.with_process_full`. `with_process_full` does an `Lwt.finally` to make sure it calls `process#close`. `close` is throwing because the process's fd is invalid (perhaps already closed).

The stack traces aren't much to go on, but I suspect this is happening on exceptions. The file descriptor shouldn't be invalid otherwise. That means there's an underlying exception that's getting hidden by the `finally` also throwing.

So, here we ignore `EBADF` exceptions from the `close` and rethrow the original exception.

I also filed ocsigen/lwt#956 to discuss doing this upstream.

Changelog: [internal]

Reviewed By: samwgoldman

Differential Revision: D37420401

fbshipit-source-id: 0e7ddf41a3f4d13290e49c692be66723eda1f227
facebook-github-bot pushed a commit to facebook/flow that referenced this issue Jun 29, 2022
Summary:
we're seeing exceptions like this:

```
Failed to initialize watchman: Unix.Unix_error(Unix.EBADF, "close", "")
Raised by primitive operation at Lwt_unix.self_result in file "src/unix/lwt_unix.cppo.ml", line 237, characters 14-31
Re-raised at Watchman.get_sockname.(fun) in file "flow/src/hack_forked/watchman/watchman.ml", line 381, characters 2-934
Re-raised at FileWatcher.WatchmanFileWatcher.watchman#wait_for_init.go_exn.(fun) in file "flow/src/monitor/fileWatcher.ml", line 534, characters 10-1023
```

`Watchman.get_sockname` calls `LwtSysUtils.exec` which calls `Lwt_process.with_process_full`. `with_process_full` does an `Lwt.finally` to make sure it calls `process#close`. `close` is throwing because the process's fd is invalid (perhaps already closed).

The stack traces aren't much to go on, but I suspect this is happening on exceptions. The file descriptor shouldn't be invalid otherwise. That means there's an underlying exception that's getting hidden by the `finally` also throwing.

So, here we ignore `EBADF` exceptions from the `close` and rethrow the original exception.

I also filed ocsigen/lwt#956 to discuss doing this upstream.

Changelog: [internal]

Reviewed By: samwgoldman

Differential Revision: D37420401

fbshipit-source-id: de63ab6ed2dd4a43ffe190d31da39bd98633a4fb
@mroch
Copy link
Author

mroch commented Jul 20, 2022

circling back with results from our workaround. these are the two underlying exceptions we've gotten reports of so far (still no actual repro unfortunately):

Unix.Unix_error(Unix.EBADF, "read", "")
Raised by primitive operation at Lwt_unix.read_bigarray.(fun) in file "src/unix/lwt_unix.cppo.ml", line 681, characters 8-59
Called from Lwt_unix.retry_syscall in file "src/unix/lwt_unix.cppo.ml", line 500, characters 13-24
Re-raised at LwtSysUtils.command_result_of_process.(fun) in file "flow/src/common/lwt/lwtSysUtils.ml", line 36, characters 2-160
Re-raised at Exception.reraise in file "flow/src/hack_forked/utils/core/exception.ml", line 36, characters 33-76
Called from Lwt.Sequential_composition.backtrace_bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1927, characters 23-26
Re-raised at LwtSysUtils.with_process_full.(fun) in file "flow/src/common/lwt/lwtSysUtils.ml", line 73, characters 6-70
Re-raised at LwtSysUtils.with_process_full.(fun) in file "flow/src/common/lwt/lwtSysUtils.ml", line 69, characters 2-227
Re-raised at Watchman.get_sockname.(fun) in file "flow/src/hack_forked/watchman/watchman.ml", line 384, characters 2-934
Re-raised at FileWatcher.WatchmanFileWatcher.watchman#wait_for_init.go_exn.(fun) in file "flow/src/monitor/fileWatcher.ml", line 534, characters 10-1023
...
Unix.Unix_error(Unix.EBADF, "set_nonblock", "")
Raised at LwtSysUtils.command_result_of_process.(fun) in file "flow/src/common/lwt/lwtSysUtils.ml", line 36, characters 2-160
Re-raised at Exception.reraise in file "flow/src/hack_forked/utils/core/exception.ml", line 36, characters 33-76
Called from Lwt.Sequential_composition.backtrace_bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1927, characters 23-26
Re-raised at LwtSysUtils.with_process_full.(fun) in file "flow/src/common/lwt/lwtSysUtils.ml", line 73, characters 6-70
Re-raised at LwtSysUtils.with_process_full.(fun) in file "flow/src/common/lwt/lwtSysUtils.ml", line 69, characters 2-227
Re-raised at Watchman.get_sockname.(fun) in file "flow/src/hack_forked/watchman/watchman.ml", line 384, characters 2-934
Re-raised at FileWatcher.WatchmanFileWatcher.watchman#wait_for_init.go_exn.(fun) in file "flow/src/monitor/fileWatcher.ml", line 534, characters 10-1023
...

both are coming from this code:

let command_result_of_process process =
  (* Wait for it to finish *)
  let%lwt status = process#status
  and stdout = Lwt_io.read process#stdout
  and stderr = Lwt_io.read process#stderr in
  ...

this is the f to with_process_full, so there's been no exception that would cause make_with to call process#close yet. I can't figure how process#stdout or process#stderr could already be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants