-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use '$' instead of '.' to separate module from identifiers in symbols #13050
base: trunk
Are you sure you want to change the base?
Conversation
a95e311
to
2d02684
Compare
d108ac0
to
705eda9
Compare
705eda9
to
61501c5
Compare
Note that this will break existing demangling implementations, producing non-ascii characters (due to the use of At least on Linux, I'd suggest keeping the existing |
Could you provide more detail about which demangling implementations you mean? I am aware that Linux perf needs updating. The intention with this change is to have a common demangling scheme across operating systems, |
Here is another one: https://github.com/mstange/samply/blob/bb0dbdf13f10dc22036ef2dc4ead35b9647717d6/samply-symbols/src/demangle_ocaml.rs
While uniformity is nice, this causes a fair amount of breakage and makes demangling more complicated. And |
I see the hot potato was assigned to me... With @let-def's advice and Wikipedia's wisdom (https://en.m.wikipedia.org/wiki/Unambiguous_finite_automaton), I coded an automaton-based check for ambiguities in name mangling schemas: https://gist.github.com/xavierleroy/ace2f70d664d738970aa3034e0751d3d . The verdict is as follows:
So, yes, the change proposed in this PR re-introduces an ambiguity in name mangling that the switch to |
@xavierleroy would you want to see that escaping change made in this PR? It's not clear how large a change that is, I am currently looking at the code. @copy I wasn't aware of that project, thank you for the link. Not having used it did that project break with the change from |
No, the
Yes, that would be appreciated. FWIW, I tested
One thing to keep in mind is that demanglers may want to keep support for older versions of OCaml. I wonder if overloading |
73c979d introduces octal escaping. For this small example program: (* fib.ml *)
let rec fib n =
if n = 0 then 0
else if n = 1 then 1
else fib (n-1) + fib (n-2)
(*
With octal escaping this becomes "_camlFib$$136$052_272"
With hexidecimal escaping this was "_camlFib$$5e$2a_272"
*)
let (^*) a r = Printf.printf "fib(%d) = %d" a r
let main () =
let a = 10 in
let r = fib a in
a ^* r
let () = main () Not sure if there are other places that need updating? |
Are you using a 36-bit computer? Because otherwise there is no reason whatsoever to use octal. OCaml uses decimal and hexadecimal for character escapes. |
73c979d
to
a36d623
Compare
a36d62399a actually does decimal escaping (leaving the 36bit computer era behind 😀 ). For the same example program, with hexidecimal escaping this was |
I'm reassured :-) |
Tim McGilchrist (2024/04/07 17:13 -0700):
[a36d623](a36d623)
actually does decimal escaping (leaving the 36bit computer era behind
😀 ). For the same example program, with hexidecimal escaping this was
`_camlFib$$5e$2a_272` this now becomes `_camlFib$$094$042_272`.
Can you please squash the two commits?
|
Fabian (2024/04/02 02:59 -0700):
Note that this will break existing demangling implementations,
producing non-ascii characters (due to the use of `$xx` encoding).
Apologies for the naïve question: are there that many demanglers around?
I also do understand that none of the approaches will be ideal and still
we will have to choose one, or to make the mangling configurable but I
am not sure that's something we want to do.
|
Fabian (2024/04/02 23:17 -0700):
While uniformity is nice, this causes a fair amount of breakage and
makes demangling more complicated. And `.` seems to work fine on Linux
(and maybe future version of macOS too).
If a demangler knows that it is demangling an OCaml program, am I
correct that it can also adapt its demangling algorithm to the version
of the executable? Can we perhapssubmit patches that will be able to
cope with different separators we used?
|
Thinking about this further: am I correct that it should be possible to patch the demanglers so that they can (roughly) accomodate the two mangling schemes? Based on that a name can't contain both a |
I'm not aware of any other demanglers except the ones already mentioned in this thread.
My suggestion would be to keep using
4.14 and 5.0 symbols don't contain
But I prefer my suggestion above, for the following reasons:
Anyway, it's not a hill I will die on. |
55ab2f3
to
bea5f93
Compare
The Rust mangling system, I think wisely, includes a mangling convention version number in every mangled identifier. If we don't do that now, we should consider it the next time we look at name mangling. Maybe one day we will end up with something like |
bea5f93
to
a951172
Compare
In my experience, reporting this kind of bug to Apple is just a waste of time... |
Change mangling of OCaml long identifiers from `camlModule.name_NNN` to `camlModule$name_NNN`. Also changes the encoding of special characters from $xx (two hex digits) to e.g. $ddd (three decimal digits). The previous mangling schema, using `.`, conflicted with LLDB on macOS and MASM in the MSVC port. Mangled names are now consistent across all ports.
a951172
to
a79eeef
Compare
@xavierleroy Thank you for the ambiguity code and suggestions for additional improvements. I have taken the opportunity to add you as a contributor to this fix (if you prefer otherwise please let me know).
My perspective on this is, I need to work with the functionality available from lldb and Apple. LLDB doesn't currently support breakpoints with |
Currently, OCaml's native code compiler (ocamlopt) mangles names using the following format
camlModule.Submodule.itentifier_stamp
. This scheme causes issues with LLDB on macOS(#12933) breaking the ability to set breakpoints and needs to be worked around in the MSVC port (#12640). Changing to$
everywhere provides a consistent naming scheme across platforms, supports all targeted assemblers and debuggers (GDB/LLDB). See #12933 (comment)There are five places that are impacted by this change:
camlModule$Submodule$identifier_stamp
_caml_function
do not change eg _caml_alloc$
. This impacts any users of function sections (#8526) who will need to update linker scripts.caml_system__code_begin
andcaml_system__code_end
still use double underscores because they are referenced from C.This change fixes the issue with setting breakpoints in LLDB on macOS (#12933), keeps the original fixes (#8998, #11321) from #11430.