Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use macros from limits.h to prevent signed integer wrap-around warnigns #13083

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

MisterDA
Copy link
Contributor

@MisterDA MisterDA commented Apr 8, 2024

The code is currently correct since we use wrap-around semantics for signed integers (-fwrapv), but:

  • it's difficult to communicate that fact to static analyzers, which warn when computing the minimum integer with left-shifting 1 to the sign bit position (most-significant bit);
  • MSVC doesn't support wrap-around semantics, but historically hasn't optimized for this (so no harm), and might innocuously warn.

Using constants from <limits.h> instead allows for self-documenting code and silences these warnings.

Computing the minimum signed integer

From the standard (which I recall doesn't consider wrap-around semantics for signed integers):

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. […] If E1 has a signed type and nonnegative value, and E1 × 2^E2 is can't be represented in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The problem being that the result of 1 << CHAR_BIT * sizeof(int) - 1 to compute the minimum int can't be represented in the result type (it's 2^63, but the maximum is 2^63-1); without wrap-around.

Introduce the INTNAT_MIN macro to avoid independent re-definitions of this value.

Is a change entry needed?
This also prevents warnings raised under Windows by clang-cl and improves code quality with MSVC.

(I might have confused undefined behavior with unspecified behavior, oh well)

@MisterDA MisterDA changed the title Limits.h min int Use macros from limits.h to prevent signed integer wrap-around warnigns Apr 8, 2024
runtime/bigarray.c Outdated Show resolved Hide resolved
@@ -140,16 +140,19 @@ typedef unsigned char uint8_t;
typedef long intnat;
typedef unsigned long uintnat;
#define ARCH_INTNAT_PRINTF_FORMAT "l"
#define INTNAT_MIN LONG_MIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried moving these defines to runtime/caml/misc.h? runtime/caml/config.h doesn't include <limits.h> but these new macros depend on it, so it would make more sense to define them in a place where <limits.h> is included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it's true that config.h is missing limits.h, but adding the following to misc.h also seems like wasted duplication.

#if SIZEOF_PTR == SIZEOF_LONG
/* Standard models: ILP32 or I32LP64 */
#define INTNAT_MIN LONG_MIN
#elif SIZEOF_PTR == SIZEOF_INT
/* Hypothetical IP32L64 model */
#define INTNAT_MIN INT_MIN
#elif SIZEOF_PTR == 8
/* Win64 model: IL32P64 */
#define INTNAT_MIN INT64_MIN
#endif

config.h could include limits.h instead, we've switched to C11, and most of the compatibility code around C99 integer types seems to have been added for old MSVC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preprocessor logic duplication is unfortunate but probably acceptable (with a comment telling it must match what's in config.h) if adding <limits.h> to config.h is considered too large a change.

I think a Changes entry will be required if config.h now includes <limits.h>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opting to add limits.h to config.h. I think a follow-up PR could switch entirely to C99 fixed-width integers all the macros and defines of config.h.

@MisterDA MisterDA force-pushed the limits.h-min-int branch 3 times, most recently from 68289f9 to d36498c Compare April 11, 2024 18:53
@NickBarnes
Copy link
Contributor

I'll review this.

Copy link
Contributor

@NickBarnes NickBarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good, a clear improvement.

@MisterDA
Copy link
Contributor Author

This is all good, a clear improvement.

Thanks, I've rebased on trunk and added you as a reviewer.

@xavierleroy
Copy link
Contributor

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

@NickBarnes
Copy link
Contributor

NickBarnes commented Apr 29, 2024

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

This makes sense to me, and could remove the test for SIZEOF_PTR == SIZEOF_LONG etc in config.h. It does need <stdint.h>, but although I see that we use HAS_STDINT_H in config.h, I suspect that parts of the runtime wouldn't compile at all if <stdint.h> were not available.

While we're on the subject, it's surprising to me that we don't seem to have, or use, CAML_INT_MAX and CAML_INT_MIN (or similar names). Maybe this PR would be a reasonable time to introduce them?

@xavierleroy
Copy link
Contributor

although I see that we use HAS_STDINT_H in config.h, I suspect that parts of the runtime wouldn't compile at all if <stdint.h> were not available.

Right. <stdint.h> is standard since C99, and OCaml 5 requires C11, so we should use <stdint.h> unconditionally and remove the configure test for it.

@MisterDA
Copy link
Contributor Author

MisterDA commented Apr 29, 2024

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

While we're on the subject, it's surprising to me that we don't seem to have, or use, CAML_INT_MAX and CAML_INT_MIN (or similar names).

Two good suggestions. I've changed the definitions to use the {u,}intptr_t limits, and namespaced the macros with the CAML_ prefix. It's technically a breaking change to move from UINTNAT_MAX to CAML_UINTNAT_MAX, but opam grep UINTNAT_MAX doesn't return anything.
Would you rather use INTPTR_MIN directly and not have CAML_INTNAT_MIN?

@NickBarnes
Copy link
Contributor

What I meant about CAML_INT_MAX and CAML_INT_MIN was the max and min values of OCaml's int type. I find these are in fact currently defined in mlvalues.h as Max_long and Min_long (which I think are confusing names!).

@MisterDA
Copy link
Contributor Author

I've rebased this PR.

What I meant about CAML_INT_MAX and CAML_INT_MIN was the max and min values of OCaml's int type. I find these are in fact currently defined in mlvalues.h as Max_long and Min_long (which I think are confusing names!).

I've introduced CAML_LONG_{MAX,MIN} macros replacing {Max,Min}_long. I think that LONG instead of INT is more consistent with the current naming. Should I retain the former names for compatibility? Are we convinced that this is a good idea?

@NickBarnes
Copy link
Contributor

NickBarnes commented May 14, 2024

On reflection we shouldn't change Max_long or Min_long in this PR, and I regret suggesting it.
Those names have been fixed for decades and there may be a lot of code out there using them (opam grep immediately finds base_bigstring for instance). If we did change them, or offer new alternatives, IMO it should be CAML_INT_MAX and CAML_INT_MIN, because they are the maximum and minimum values of the OCaml type int.

@MisterDA
Copy link
Contributor Author

MisterDA commented May 14, 2024

On reflection we shouldn't change Max_long or Min_long in this PR, and I regret suggesting it. Those names have been fixed for decades and there may be a lot of code out there using them (opam grep immediately finds base_bigstring for instance).

My thoughts also, I'll remove that commit.

If we did change them, or offer new alternatives, IMO it should be CAML_INT_MAX and CAML_INT_MIN, because they are the maximum and minimum values of the OCaml type int.

but on 64-bits arches, only Val_long maps to a 63-bit integer, right? not Val_int, which is cast'ed to (int).

This fixes the warning from MSVC raised on -0x80000000.

> warning C4146: unary minus operator applied to unsigned type, result
> still unsigned

The other replacements are made for consistency and, hopefully,
legibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants