Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression with default GC settings between 4.14.2 and 5.1.1 #13123

Open
toots opened this issue Apr 25, 2024 · 7 comments
Open

Regression with default GC settings between 4.14.2 and 5.1.1 #13123

toots opened this issue Apr 25, 2024 · 7 comments

Comments

@toots
Copy link
Contributor

toots commented Apr 25, 2024

Hi!

We're in the process of switching liquidsoap to OCaml 5.1.1 and we've noticed some pretty severe regressions with the garbage collector.

I'm still testing and need to confirm wether or not we can get to comparable memory/CPU usage than with 4.14 but I'd like to report a first, very simple case.

This liquidsoap script:

output.dummy(blank())

Is basically a runtime loop creating a float array array (ocaml native) of 0.04s of blank PCM audio every 0.04 and discarding it. No C code, only OCaml.

With 4.14, the memory footprint looks like this:
Screenshot 2024-04-25 at 8 16 31 AM

(A big chunk of the memory usage here is due to the language's library)

With OCaml 5.1.1 default GC params, however:
Screenshot 2024-04-25 at 8 37 08 AM

The memory peaks to a pretty high value then stays oscillating around 4x more.

Setting space_overhead=40 actually achieves a better memory footprint than with 4.14.2:
Screenshot 2024-04-25 at 8 52 47 AM

CPU usage is very low on my machine for each case so it's hard to see a pattern there.

I am still testing with more sophisticated scripts and it's not clear yet if setting space_overhead=40 makes it achieve comparable runtime CPU/memory perfs but it looks possible.

@tmcgilchrist
Copy link
Contributor

Perhaps these changes are related #12754, #13086 and #12493.

Olly can give you GC statistics on macOS (which you seem to be using based on those Instruments screenshots) https://github.com/tarides/runtime_events_tools.

If you can reduce this to a small reproduction case, I can add it to sandmark.

@toots
Copy link
Contributor Author

toots commented Apr 26, 2024

The problem is not linked to bigarray as it only uses ocaml values.

I was able to write a reproduction test! It looks like the issue happens when the program has some memory pernamently allocated, the standard library in the case of liquidsoap.

There also seems to be a threshold effect: little memory allocated is okay, a lot seems okay too. However, in the middle, around 40Mo, is when the issues seems to be triggered.

Reproduction code:

let frame_size = 0.04
let pcm_len = int_of_float (44100. *. frame_size)
let channels = 2

let deadweigth = Array.make (4000 * 1024) 1.

let mk_pcm () = Array.init channels (fun _ -> Array.make pcm_len 0.)

let rec fn () =
  let pcm = mk_pcm () in
  ignore(pcm);
  Unix.sleepf 0.04;
  fn ()

let () =
  let th = Thread.create fn () in
  Thread.join th

Memory:
Screenshot 2024-04-25 at 9 09 46 PM

BTW, I'm using macos memory profiler because it's really good at giving me only the program's private allocations. Pretty sure the problem happens on other OS/platforms.

@toots
Copy link
Contributor Author

toots commented Apr 26, 2024

Setting space_overhead to 40 also seems to help with the example:
Screenshot 2024-04-25 at 9 17 56 PM

@toots
Copy link
Contributor Author

toots commented Apr 26, 2024

Ok, I think I've refined the example to be even closer to us:

  • First allocate a super large amount of data
  • Delalocate it
  • Run a quick loop

Code:

let frame_size = 0.04
let pcm_len = int_of_float (44100. *. frame_size)
let channels = 2

let mk_pcm () = Array.init channels (fun _ -> Array.make pcm_len 0.)

let rec fn a =
  if Array.length a <> 0 then
    Gc.full_major ();
  let pcm = mk_pcm () in
  ignore(pcm);
  Unix.sleepf 0.04;
  fn [||]

let () =
  let deadweigth = Array.make (40 * 1024 * 1024) 1 in
  Unix.sleepf 0.04;
  let th = Thread.create fn deadweigth in
  Thread.join th

Memory consumption:
Screenshot 2024-04-25 at 9 41 52 PM

Woof!

Looks like I can see. the following:

  • The GC is kinda doing what it needs to do in my previous example: keep a ratio of the live memory, here, about 65Mo out of the ~40Mo of allocated memory. It's the spirit of it but this is neither what 4.14 was effectively doing and, frankly, not a great experience when you work with an app that has large amount of permanently allocated memory and short amount of transitory memory.
  • Underlying this, there seems to be a bug where the GC keeps the previous ratio in its calculation, hence the second example: the memory grows according to the ratio from the initially super large allocated memory however, this memory is already gone, leading to a incredibly lopsided allocation.

Last, a side question: would it be possible to be more directive with the GC? In my application, I know exactly when I should ask the GC to check for memory to cleanup, which is after each media loop. Could it be possible to set the GC params to be very lazy and trigger a check every time a loop terminates?

@toots toots closed this as completed Apr 26, 2024
@toots toots reopened this Apr 26, 2024
@gasche
Copy link
Member

gasche commented Apr 26, 2024

Note: if your application starts by allocating a lot and throwing most of it away, and you know in the code where that initialization phase ends, you can call an explicit compaction to ensure that that memory is given back to the OS, and that the rest of the program starts from a smaller memory footprint. Compaction was re-enabled in 5.x only recently by @sadiqj (it is in the release branch for 5.2), and it may benefit your workload.

@toots
Copy link
Contributor Author

toots commented Apr 26, 2024

Note: if your application starts by allocating a lot and throwing most of it away, and you know in the code where that initialization phase ends, you can call an explicit compaction to ensure that that memory is given back to the OS, and that the rest of the program starts from a smaller memory footprint. Compaction was re-enabled in 5.x only recently by @sadiqj (it is in the release branch for 5.2), and it may benefit your workload.

Thanks! Gc.compact does not seem to help with the last example.

I should have mentioned that these were all confirmed with the latest ocaml git code as well.

@toots
Copy link
Contributor Author

toots commented Apr 26, 2024

For reference, this is the memory profile with 4.14.2 on the last example:

Screenshot 2024-04-26 at 8 05 04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants