-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some memory bugs in runtime_events_consumer.c
#13091
base: trunk
Are you sure you want to change the base?
Conversation
Test cases for some recent bugs in lib-runtime-events
It looks like, perhaps unsurprisingly, not all platforms work with the new test (though MSVC just had a quite disappointing network error), so I intend to remove it later, since I can't think of a better way to force failure. |
@@ -100,28 +100,31 @@ caml_runtime_events_create_cursor(const char_os* runtime_events_path, int pid, | |||
|
|||
struct caml_runtime_events_cursor *cursor = | |||
caml_stat_alloc_noexc(sizeof(struct caml_runtime_events_cursor)); | |||
int should_free_events_loc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see that variable initialized to zero at the declaration point, so that future changes in this routine do not risk using an uninitialized value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following in that style should int ret;
get initialised to something since it is being returned at the end of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't hurt either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made ret
uninitialised specifically so the compiler would emit a warning if it isn't assigned before jumping; and should_free_events_loc
uninitialised because it's to be initialised where runtime_events_loc
is.
A change that would accidentally use these with the default initialisation is already wrong, so my reasoning is that it's better to have an obviously invalid value that the compiler would yell at us about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is relying upon the C compiler being always correct in figuring out whether a variable is initialized or not. It turns out they aren't always, with both false positives and false negatives. I wouldn't put too much trust in the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: do we need a boolean to tell us to free runtime_events_loc
, or could we just free exactly when the variable is not NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: do we need a boolean to tell us to free runtime_events_loc, or could we just free exactly when the variable is not NULL?
It's not the NULL
case we're worried about, but the case where it's initialised to caml_runtime_events_current_location()
, which isn't a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I looked out of curiosity, wondering if just making a copy in that case would make things nicer. My quick skim suggests that the function is sprawling and somewhat of a mess, and I think that this is going to bite us again in the future. So I would encourage you to refactor it as you see fit -- for example maybe the system-dependent bits could be factored out into their own helper functions -- to make the code nicer, and not just safer. Your current change is okay but it does not make the code nicer to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored caml_runtime_events_create_cursor
and made runtime_events_loc
be a new string every time to make reasoning easier.
Runtime_events.pause () | ||
|
||
(* workaround for finding the events file even on Windows, where | ||
[Unix.getpid] doesn't match the one used to open the file *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the difference is on Windows? The header file says:
[pid] is the process id (or equivalent) of the startup OCaml process.
If there is a known difference it would be useful to state that in the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is not with Windows or Runtime_events
, but with the way the Unix
library handles PIDs on Windows (as Windows HANDLE
s casted to int
). Unix.getpid()
doesn't actually return the PID of the process (see #4034), whereas the ring buffer file does use the actual PID. I mention possible fixes in this footnote on #13089.
- Introduce `format_runtime_events_loc`, allocating a new string each time - Introduce `cursor_map_ring_file`, which now also closes `ring_fd`, and closes handles on Windows on failure - Use `memset` to zero-initialise cursor callbacks
I've spotted and fixed another bug, notably On trunk this crashes after opening too many file descriptors, causing let () =
Runtime_events.start ();
try
for _ = 1 to 1024 (* or whatever your [ulimit -n] is *) do
Runtime_events.(create_cursor None |> free_cursor)
done
with _ ->
Runtime_events.(create_cursor None |> free_cursor) I also perform cleanup on the Windows handles if the mapping fails. |
I'll review this. |
Should the ring file be also marked non-inheritable on Windows and close-on-exec on Unix? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly an improvement on the previous code. A few stylistic quibbles.
struct caml_runtime_events_cursor *cursor = | ||
caml_stat_alloc_noexc(sizeof(struct caml_runtime_events_cursor)); | ||
/** Return a new string with the path of the ring file */ | ||
static runtime_events_error format_runtime_events_loc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that (because C is antediluvian) enumeration constants such as E_PATH_FAILURE
are of type int
, so here you are forcing an implicit conversion into the enum type, and then at the call site there's an implicit conversion back to int
. So maybe this return type should be int
? For this reason, I never name enum types (so I never have variables or slots of enum types).
} | ||
failed2: | ||
CloseHandle(cursor->ring_handle); | ||
failed1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using fail labels which reflect the failure, e.g. fail_file_mapping
and fail_map_view
here.
if (cursor->ring_file_handle == INVALID_HANDLE_VALUE) { | ||
caml_stat_free(cursor); | ||
caml_stat_free(runtime_events_loc); | ||
return E_OPEN_FAILURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, please set ret
and then goto fail_create_file
here (putting that label right before the return ret
).
static runtime_events_error | ||
cursor_map_ring_file(struct caml_runtime_events_cursor *cursor, | ||
char_os *runtime_events_loc) { | ||
int ret = 0; | ||
#ifdef _WIN32 | ||
cursor->ring_file_handle = CreateFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong preference for not setting any fields in cursor
until we know we're succeeding. Use local variables and then copy them in right before return E_SUCCESS
.
failed1: | ||
CloseHandle(cursor->ring_file_handle); | ||
return ret; | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stylistic distinction between the Windows and non-Windows sides of this function is quite jarring.
} | ||
|
||
ret = E_SUCCESS; | ||
/* fallthrough */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no fall-through.
*cursor_res = cursor; | ||
ret = E_SUCCESS; | ||
/* fallthrough */ | ||
failed2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remarks as for previous functions:
- send all failure cases to the failure section;
- use distinct failure labels for each failure case (even if there is no undo action between a pair of failure labels);
- name each failure label according to the action which has failed;
- only initialize local variables until you know you have succeeded.
if (ret != E_SUCCESS) goto failed1; | ||
|
||
ret = cursor_map_ring_file(cursor, runtime_events_loc); | ||
if (ret != E_SUCCESS) goto failed2; | ||
|
||
cursor->current_positions = | ||
caml_stat_alloc(cursor->metadata->max_domains * sizeof(uint64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this allocation fails, you get an exception and all this careful failure-path code is stymied. Use caml_stat_alloc_noexc
and then handle the failure case in the same way as all the others.
failed2: | ||
caml_stat_free(runtime_events_loc); | ||
failed1: | ||
if (ret != E_SUCCESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ret
is E_SUCCESS
here then surely something has gone badly wrong?
*cursor_res = cursor; | ||
ret = E_SUCCESS; | ||
/* fallthrough */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, though.
This PR fixes a few memory bugs surrounding
runtime_events_loc
inruntime_events_consumer.c
, as mentioned in #13089, specifically incaml_runtime_events_create_cursor
.These bugs are:
Runtime_events.create_cursor None
(thepid < 0
branch in the C code) allocatesruntime_events_loc
but reassigns it, making the old pointer unreachable without deallocating it:ocaml/otherlibs/runtime_events/runtime_events_consumer.c
Lines 108 to 124 in 4c6a384
runtime_events_loc
after it is introduced if and only if the function returns with an error.1 This is incorrect, both because:pid < 0
code path, it shouldn't be freed at all, sincecaml_runtime_events_current_location()
does not return a new stringpid >= 0
code path, it should always be freed, sinceruntime_events_loc
does not escape the functionThis PR:
runtime_events_loc
's allocation into the relevant branch, rewriting the function to be single-exit after cursor allocation succeeds, and by having both frees state their explicit guards:ocaml/otherlibs/runtime_events/runtime_events_consumer.c
Lines 237 to 252 in 1c337f6
path
-freeing if-blocks in thecaml_ml_runtime_events_create_cursor
wrapper which calls it, for clarity. This does not change the behaviour of the program.chmod 000
s the ring buffer file in order to cause subsequent read attempts to fail..events
file, so it works on Windows (rather than skipping), because I wanted to actually test #13089.Footnotes
This is the correct behaviour for
cursor
, since the caller takes ownership of the returned cursor iff the function returns successfully ↩