Fix some memory bugs in `runtime_events_consumer.c` #13091

eutro · 2024-04-10T16:03:01Z

This PR fixes a few memory bugs surrounding runtime_events_loc in runtime_events_consumer.c, as mentioned in #13089, specifically in caml_runtime_events_create_cursor.

These bugs are:

The code path for Runtime_events.create_cursor None (the pid < 0 branch in the C code) allocates runtime_events_loc but reassigns it, making the old pointer unreachable without deallocating it:

ocaml/otherlibs/runtime_events/runtime_events_consumer.c

Lines 108 to 124 in 4c6a384

    
           runtime_events_loc = caml_stat_alloc_noexc(RING_FILE_NAME_MAX_LEN); 
        
           if (runtime_events_loc == NULL) { 
        
             caml_stat_free(cursor); 
        
             return E_ALLOC_FAIL; 
        
           } 
        
           /* If pid < 0 then we create a cursor for the current process */ 
        
           if (pid < 0) { 
        
             runtime_events_loc = caml_runtime_events_current_location(); 
        
             if( runtime_events_loc == NULL ) { 
        
               caml_stat_free(cursor); 
        
               return E_NO_CURRENT_RING; 
        
             } 
        
           } else {

The function frees runtime_events_loc after it is introduced if and only if the function returns with an error.¹ This is incorrect, both because:
- in the pid < 0 code path, it shouldn't be freed at all, since caml_runtime_events_current_location() does not return a new string
- in the pid >= 0 code path, it should always be freed, since runtime_events_loc does not escape the function

This PR:

Fixes these by moving runtime_events_loc's allocation into the relevant branch, rewriting the function to be single-exit after cursor allocation succeeds, and by having both frees state their explicit guards:

ocaml/otherlibs/runtime_events/runtime_events_consumer.c

Lines 237 to 252 in 1c337f6

    
             *cursor_res = cursor; 
        
             ret = E_SUCCESS; 
        
            free_events_loc: 
        
             if (should_free_events_loc) { 
        
               caml_stat_free(runtime_events_loc); 
        
             } 
        
            free_cursor: 
        
             if (ret != E_SUCCESS) { 
        
               caml_stat_free(cursor); 
        
             } 
        
             return ret; 
        
           }

Deduplicates the two identical path-freeing if-blocks in the caml_ml_runtime_events_create_cursor wrapper which calls it, for clarity. This does not change the behaviour of the program.
Adds a test which tries to reach these cases, though only the double-free (and Windows on the commit before #13089) actually causes a failure of the test.
- This chmod 000s the ring buffer file in order to cause subsequent read attempts to fail.
- It uses a slight workaround to find the correct PID and .events file, so it works on Windows (rather than skipping), because I wanted to actually test #13089.
- The testsuite does fail on this commit.

This is the correct behaviour for cursor, since the caller takes ownership of the returned cursor iff the function returns successfully ↩

Test cases for some recent bugs in lib-runtime-events

…umer.c`

eutro · 2024-04-10T16:13:58Z

It looks like, perhaps unsurprisingly, not all platforms work with the new test (though MSVC just had a quite disappointing network error), so I intend to remove it later, since I can't think of a better way to force failure.

dustanddreams · 2024-04-11T05:38:47Z

otherlibs/runtime_events/runtime_events_consumer.c

@@ -100,28 +100,31 @@ caml_runtime_events_create_cursor(const char_os* runtime_events_path, int pid,

  struct caml_runtime_events_cursor *cursor =
      caml_stat_alloc_noexc(sizeof(struct caml_runtime_events_cursor));
+  int should_free_events_loc;


I'd rather see that variable initialized to zero at the declaration point, so that future changes in this routine do not risk using an uninitialized value.

Following in that style should int ret; get initialised to something since it is being returned at the end of the function?

That wouldn't hurt either.

I made ret uninitialised specifically so the compiler would emit a warning if it isn't assigned before jumping; and should_free_events_loc uninitialised because it's to be initialised where runtime_events_loc is.

A change that would accidentally use these with the default initialisation is already wrong, so my reasoning is that it's better to have an obviously invalid value that the compiler would yell at us about.

This is relying upon the C compiler being always correct in figuring out whether a variable is initialized or not. It turns out they aren't always, with both false positives and false negatives. I wouldn't put too much trust in the compiler.

Drive-by comment: do we need a boolean to tell us to free runtime_events_loc, or could we just free exactly when the variable is not NULL?

Drive-by comment: do we need a boolean to tell us to free runtime_events_loc, or could we just free exactly when the variable is not NULL?

It's not the NULL case we're worried about, but the case where it's initialised to caml_runtime_events_current_location(), which isn't a copy.

I see. I looked out of curiosity, wondering if just making a copy in that case would make things nicer. My quick skim suggests that the function is sprawling and somewhat of a mess, and I think that this is going to bite us again in the future. So I would encourage you to refactor it as you see fit -- for example maybe the system-dependent bits could be factored out into their own helper functions -- to make the code nicer, and not just safer. Your current change is okay but it does not make the code nicer to read.

I've refactored caml_runtime_events_create_cursor and made runtime_events_loc be a new string every time to make reasoning easier.

tmcgilchrist · 2024-04-11T07:12:43Z

testsuite/tests/lib-runtime-events/test_create_cursor_failures.ml

+  Runtime_events.pause ()
+
+(* workaround for finding the events file even on Windows, where
+   [Unix.getpid] doesn't match the one used to open the file *)


Can you explain what the difference is on Windows? The header file says:

[pid] is the process id (or equivalent) of the startup OCaml process.

https://github.com/ocaml/ocaml/blob/trunk/otherlibs/runtime_events/caml/runtime_events_consumer.h#L31

If there is a known difference it would be useful to state that in the header file.

The issue is not with Windows or Runtime_events, but with the way the Unix library handles PIDs on Windows (as Windows HANDLEs casted to int). Unix.getpid() doesn't actually return the PID of the process (see #4034), whereas the ring buffer file does use the actual PID. I mention possible fixes in this footnote on #13089.

- Introduce `format_runtime_events_loc`, allocating a new string each time - Introduce `cursor_map_ring_file`, which now also closes `ring_fd`, and closes handles on Windows on failure - Use `memset` to zero-initialise cursor callbacks

…ping

eutro · 2024-04-16T11:30:51Z

I've spotted and fixed another bug, notably Runtime_events.create_cursor None never closing the file descriptor on Unix.

On trunk this crashes after opening too many file descriptors, causing create_cursor None to fail and trigger the existing double-free bug:

let () =
  Runtime_events.start ();
  try
    for _ = 1 to 1024 (* or whatever your [ulimit -n] is *) do
      Runtime_events.(create_cursor None |> free_cursor)
    done
  with _ ->
    Runtime_events.(create_cursor None |> free_cursor)

I also perform cleanup on the Windows handles if the mapping fails.

NickBarnes · 2024-04-17T13:30:02Z

I'll review this.

MisterDA · 2024-04-23T16:28:48Z

Should the ring file be also marked non-inheritable on Windows and close-on-exec on Unix?

NickBarnes

Clearly an improvement on the previous code. A few stylistic quibbles.

NickBarnes · 2024-04-24T15:28:42Z

otherlibs/runtime_events/runtime_events_consumer.c

-  struct caml_runtime_events_cursor *cursor =
-      caml_stat_alloc_noexc(sizeof(struct caml_runtime_events_cursor));
+/** Return a new string with the path of the ring file */
+static runtime_events_error format_runtime_events_loc(


Note that (because C is antediluvian) enumeration constants such as E_PATH_FAILURE are of type int, so here you are forcing an implicit conversion into the enum type, and then at the call site there's an implicit conversion back to int. So maybe this return type should be int? For this reason, I never name enum types (so I never have variables or slots of enum types).

NickBarnes · 2024-04-24T15:31:53Z

otherlibs/runtime_events/runtime_events_consumer.c

-  }
+ failed2:
+  CloseHandle(cursor->ring_handle);
+ failed1:


I suggest using fail labels which reflect the failure, e.g. fail_file_mapping and fail_map_view here.

NickBarnes · 2024-04-24T15:32:57Z

otherlibs/runtime_events/runtime_events_consumer.c

  if (cursor->ring_file_handle == INVALID_HANDLE_VALUE) {
-    caml_stat_free(cursor);
-    caml_stat_free(runtime_events_loc);
    return E_OPEN_FAILURE;


For consistency, please set ret and then goto fail_create_file here (putting that label right before the return ret).

NickBarnes · 2024-04-24T15:33:59Z

otherlibs/runtime_events/runtime_events_consumer.c

+static runtime_events_error
+cursor_map_ring_file(struct caml_runtime_events_cursor *cursor,
+                     char_os *runtime_events_loc) {
+  int ret = 0;
 #ifdef _WIN32
  cursor->ring_file_handle = CreateFile(


Strong preference for not setting any fields in cursor until we know we're succeeding. Use local variables and then copy them in right before return E_SUCCESS.

NickBarnes · 2024-04-24T15:35:15Z

otherlibs/runtime_events/runtime_events_consumer.c

+ failed1:
+  CloseHandle(cursor->ring_file_handle);
+  return ret;
+#else


The stylistic distinction between the Windows and non-Windows sides of this function is quite jarring.

NickBarnes · 2024-04-24T15:37:51Z

otherlibs/runtime_events/runtime_events_consumer.c

  }
+
+  ret = E_SUCCESS;
+  /* fallthrough */


There is no fall-through.

NickBarnes · 2024-04-24T15:43:03Z

otherlibs/runtime_events/runtime_events_consumer.c

  *cursor_res = cursor;
+  ret = E_SUCCESS;
+  /* fallthrough */
+ failed2:


Same remarks as for previous functions:

send all failure cases to the failure section;

use distinct failure labels for each failure case (even if there is no undo action between a pair of failure labels);

name each failure label according to the action which has failed;

only initialize local variables until you know you have succeeded.

NickBarnes · 2024-04-24T15:44:11Z

otherlibs/runtime_events/runtime_events_consumer.c

+  if (ret != E_SUCCESS) goto failed1;
+
+  ret = cursor_map_ring_file(cursor, runtime_events_loc);
+  if (ret != E_SUCCESS) goto failed2;

  cursor->current_positions =
      caml_stat_alloc(cursor->metadata->max_domains * sizeof(uint64_t));


If this allocation fails, you get an exception and all this careful failure-path code is stymied. Use caml_stat_alloc_noexc and then handle the failure case in the same way as all the others.

NickBarnes · 2024-04-24T15:45:07Z

otherlibs/runtime_events/runtime_events_consumer.c

+ failed2:
+  caml_stat_free(runtime_events_loc);
+ failed1:
+  if (ret != E_SUCCESS) {


If ret is E_SUCCESS here then surely something has gone badly wrong?

NickBarnes · 2024-04-24T15:45:14Z

otherlibs/runtime_events/runtime_events_consumer.c

  *cursor_res = cursor;
+  ret = E_SUCCESS;
+  /* fallthrough */


It doesn't, though.

eutro added 2 commits April 10, 2024 15:12

Add tests/lib-runtime-events/test_create_cursor_failures.ml

78a3ba6

Test cases for some recent bugs in lib-runtime-events

Clarify and fix caml_stat memory management in `runtime_events_cons…

1c337f6

…umer.c`

nojb closed this Apr 10, 2024

nojb reopened this Apr 10, 2024

dustanddreams reviewed Apr 11, 2024

View reviewed changes

tmcgilchrist reviewed Apr 11, 2024

View reviewed changes

eutro added 2 commits April 16, 2024 12:02

Refactor caml_runtime_events_create_cursor

0e5472f

- Introduce `format_runtime_events_loc`, allocating a new string each time - Introduce `cursor_map_ring_file`, which now also closes `ring_fd`, and closes handles on Windows on failure - Use `memset` to zero-initialise cursor callbacks

Swap ring_handle and ring_file_handle frees in Windows cursor map…

ea67b20

…ping

damiendoligez assigned NickBarnes Apr 17, 2024

NickBarnes suggested changes Apr 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some memory bugs in `runtime_events_consumer.c` #13091

Fix some memory bugs in `runtime_events_consumer.c` #13091

eutro commented Apr 10, 2024 •

edited

eutro commented Apr 10, 2024 •

edited

dustanddreams Apr 11, 2024

tmcgilchrist Apr 11, 2024

dustanddreams Apr 11, 2024

eutro Apr 11, 2024

dustanddreams Apr 11, 2024

gasche Apr 11, 2024

eutro Apr 11, 2024

gasche Apr 11, 2024

eutro Apr 16, 2024

tmcgilchrist Apr 11, 2024

eutro Apr 11, 2024

eutro commented Apr 16, 2024

NickBarnes commented Apr 17, 2024

MisterDA commented Apr 23, 2024

NickBarnes left a comment

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024

NickBarnes Apr 24, 2024


	runtime_events_loc = caml_stat_alloc_noexc(RING_FILE_NAME_MAX_LEN);

	if (runtime_events_loc == NULL) {
	caml_stat_free(cursor);
	return E_ALLOC_FAIL;
	}

	/* If pid < 0 then we create a cursor for the current process */
	if (pid < 0) {
	runtime_events_loc = caml_runtime_events_current_location();

	if( runtime_events_loc == NULL ) {
	caml_stat_free(cursor);
	return E_NO_CURRENT_RING;
	}
	} else {


	*cursor_res = cursor;
	ret = E_SUCCESS;

	free_events_loc:
	if (should_free_events_loc) {
	caml_stat_free(runtime_events_loc);
	}

	free_cursor:
	if (ret != E_SUCCESS) {
	caml_stat_free(cursor);
	}

	return ret;
	}

Fix some memory bugs in runtime_events_consumer.c #13091

Are you sure you want to change the base?

Fix some memory bugs in runtime_events_consumer.c #13091

Conversation

eutro commented Apr 10, 2024 • edited

Footnotes

eutro commented Apr 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eutro commented Apr 16, 2024

NickBarnes commented Apr 17, 2024

MisterDA commented Apr 23, 2024

NickBarnes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix some memory bugs in `runtime_events_consumer.c` #13091

Fix some memory bugs in `runtime_events_consumer.c` #13091

eutro commented Apr 10, 2024 •

edited

eutro commented Apr 10, 2024 •

edited