-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immutable arrays #13097
base: trunk
Are you sure you want to change the base?
Immutable arrays #13097
Conversation
My 2c after a quick look are below. Re-reading it I realize it sounds rather negative, but it is not my intention, please bear that in mind :) The PR seems way too invasive for such a fringe freature. For example, there is a new syntax for immutable arrays, which we all know is a "big change" (eg we have no such syntax for Also, why not do
and call it a day? I fail to see why this needs special support in the typechecker, pattern-matching engine, middle-end, compiler primitives, etc. Sure, once in a while having immutable arrays would be nice, but is it that common enough that it justifies adding this much code to the compiler?
I think very few libraries would actually need immutable arrays as part of their public API. Do you have concrete examples?
This is only a problem if libraries start using this data structure in their public APIs. Do you know of any doing so?
I don't think this alone would be reason enough to merge this PR. Just my view at the moment, would love to hear other opinions! Cheers. |
To me, there are two separable concerns here, which we may want to debate separately.
(*) In anticipation of the debate on this PR (I'm working with @OlivierNicole on the Jane Street side), I've been worried about this consumption of syntax. But in the end, I think it's OK. If we want to revisit this decision later, it seems easy enough to add the ability to bind different array-like syntaxes to different implementations. For example, we could have a I do think the comments such as those from @nojb are very helpful here. We like this design within Jane Street, where I can find 158 jbuild files (our equivalent of |
I understand the concern for unwarranted big changes. Just to give some sense of the contents of the patch here, the few hundreds of lines of the change (I’m not counting bootstrap commits and added tests) consist of the new
Given that arrays and immutable arrays have the same runtime representation, support for immutable arrays does not incur any new runtime primitives (except for unsafe conversion between the two). It’s true that immutable arrays have their own type, which is part of the feature, and thus their have their own module. But unlike float arrays, immutable array support happens at the type level.
It could be argued that a new module |
For what it's worth, we've been tempted to write |
It's a small thing, but |
In fact, that’s what it is, it just happens to be valid to leave no space between the two tokens. I agree that it is less confusing with a space. |
I believe that offering immutable arrays would be a useful feature. Documenting that an array is immutable helps clarify the code (and can help the compiler optimize it). In fact, in Osiris, our ongoing program verification project for OCaml, we do intend to (eventually) distinguish immutable and mutable arrays. If this distinction existed in OCaml already, then we would not need to create it at the level of Osiris. Regarding immutable array literals, I am less interested. Array literals are useful only when the length and content of the array are statically known, which in my experience is fairly rare, but perhaps this depends on the application area. It seems more common to me to 1- allocate a mutable array, 2- initialize it, 3- freeze it. |
It is true that the syntax is technically not necessary to create immutable arrays, given our unsafe primitive
|
I imagine that this primitive operation will have to be exposed, in the same way that |
To clarify, what I meant was that with immutable array literals and functions like |
What are use-cases of immutable arrays in practice, can we discuss some examples? @OlivierNicole mentioned that this could be useful for constant tables, as used in Unicode libraries. What are other applications that would be enticing? (@fpottier, why are you interested in adding specific support for immutable arrays in Osiris?) You mention exposing immutable arrays in APIs to write safe interfaces. I suppose that the ability to do this depends quite a bit on the conversion mechanisms that exist between mutable and immutable arrays. Can you discuss examples of using immutable arrays to write safe interfaces, and detail the conversion mechanisms? Two counterpoints could be:
|
One example of immutable array would be the arrays that are used in the internal representation of Coq/Rocq terms (for reasons of memory compactness, I believe?): https://github.com/coq/coq/blob/d7d392191a367839b0f5a7772e48b8f24e9f1b3e/kernel/constr.ml#L105 |
In graphical programming it can be more natural to have arrays of points rather than lists of points (a lot of algorithms fiddle with indices, you want to quickly find out how much of them you have etc.). But then when you pass things around you don't want people to fiddle with your arrays. If arrays were immutable I would perhaps expose the lists in this interface as arrays (and the underlying rings's point lists aswell). But then arrays are much less flexible to construct, split and merge than lists. I would likely prefer an all round efficient persistent polymorphic vector on the line of RRB vectors in the stdlib (it could be based on immutable arrays, but then it could also be based on mutable arrays) |
Immutable arrays are just one particular implementation of the abstract concept of "immutable sequence", which arguably is fairly useful and common. The upside of immutable arrays is that they are compact and support efficient random access; their downside is that concatenation and extraction of a subsequence are expensive (in terms of time and space). In some applications, these downsides are tolerable. For instance, Menhir uses immutable arrays in many places to represent sequences of symbols. In Osiris (a program verification system that we are developing), immutable values are much easier to reason about than mutable values. In short, immutable values "are" just values, whereas mutable values are addresses in memory, and one must explicitly use Separation Logic assertions to describe who is allowed to read or write these pieces of memory. Osiris will have specific support for reasoning about "pure" expressions (expressions that do not allocate, read, or write mutable memory). A program that manipulates (primitive) immutable arrays can be considered pure, whereas a program that manipulates mutable arrays cannot. |
The library Sek, developed by Arthur Charguéraud (@charguer) and myself, is intended to offer an efficient general-purpose implementation of both persistent and ephemeral sequences (that is, both immutable and mutable sequences). That said, we have received extremely little feedback about it, so I do not know whether it reaches its intended target. Also, it is probably too complex to be comfortably integrated in the standard library. |
I could see how to use dynarrays as buffers to programmatically accumulate elements, and then "freeze" them as an immutable array. This might be a reasonable approach for @dbuenzli's GG example where the sequence data structure is used to represent polygons as sequences of rings. (And rings as sequences of points, etc.) |
There are a few places in Flambda2 where we use either arrays or lists, but where immutable arrays would have been the best match. I think there are other places in the compiler where immutable arrays would make sense (a good proportion of the lists in the various compiler IRs would work as well as immutable arrays, and for some of them I expect it would be a better fit). |
(Thinking out loud: changing the .cmi format (to use arrays) is fairly painful due to bootstrappin, so I wouldn't do this for an experiment, but it might be possible to just traverse the type to count the total size of lists of interest, and deduce potetial space savings from that.) |
Jumping into the discussion with a few comments.
First, you would extend the typechecker to check that |
Another remark: Batteries has long exposed a design of "arrays with read/write capabilities" as the submodule BatArray.Cap (documentation). (As far as I know, there are few users of this feature; a quick search suggests one, two.) This design is polymorphic on the mutability, the type is Having the mutability information as a parameter is strictly more expressive than the current proposal, in particular it makes it possible to express that a function will not mutate an argument that may otherwise be mutable, by having the type |
In my experience these kind of phantom types are also strictly more annoying to work with in practice. |
And yet, if you're working with mutable data structures, this kind of thing is very useful in giving you guarantees that approach those of immutable data structures. This is similar to C++'s |
Not really. Personally I'm more interested in knowing that an array is immutable than if I'm allowed to mutate it. These are different things. |
Indeed, and in particular, CSE: if the array |
I believe that we should separate the discussion of primitive immutable arrays, implemented as a contiguous block of memory, and the discussion of various kinds of "immutable sequence" data structures, which can be packaged as separate libraries. That said, regarding syntax, you are right: we should strive to avoid the proliferation of array access syntaxes, and we should allow libraries to define their own array access syntaxes, if possible. |
Regarding the distinction between "truly immutable arrays" (immutable for everyone) and "temporarily immutable arrays" (immutable for me, but possibly mutable for someone else; or immutable now, but possibly mutable later), I believe it would be a good thing to discuss just the simpler proposal, namely having a primitive type of truly immutable arrays. In my opinion, the example of |
I'm making the distinction because I can easily think of use-cases for "temporarily immutable arrays", but I have a harder time thinking of use-cases for "truly immutable arrays". A few have been mentioned here. (I forgot to make a humorous point. @fpottier mentions that they are considering immutable arrays in Osiris because they are much easier to specify and use in proofs. This reminds me of the joke about searching for keys under the street light.) |
Point taken, but there is something to be said in favor of searching under the street light... |
Just think about your uses cases for lists :-)
One of the problem in the current state of affairs is that there is no good way to turn a Of course an abstraction could be provided for that it just doesn't exist for now1 and I'm slightly uncomfortable each time I allocate a Other than that mismatches between what you have and what an API asks for can be annoying. Something which Footnotes
|
Good suggestion! This seems easy to implement inside |
Yes, OCaml is not as good an imperative language as C++, there's nothing new there 🙂 . I agree with @dbuenzli that phantom types are not worth the annoyance, and I regret their use in As of string vs bytes: there's a lot more string literals than array literals, in my experience. Immutable strings are worth it as it's such a fundamental data type. Immutable arrays are a curiosity. |
I believe that you can also hope to gain some compiler optimizations (e.g., elimination of redundant loads). I agree that literal array expressions and literal array patterns do not seem critical. I would like to have immutable arrays in the language, but I do not really care whether they are primitive or implemented as a library; this probably does not make much difference to the end user. |
Thanks for bringing the conversation back to full immutable arrays; temporarily-immutable arrays are interesting, but indeed separate. I also think that considering grander structures on top of immutable arrays, as suggested by @charguer, is an interesting topic, but orthogonal to this proposal. (I believe all the ideas in that post are equally applicable to mutable arrays.) Although I started out wondering about the necessity of having the syntax, I've become more attached to the syntax as we've been discussing. I tend to see immutable arrays as intending to be an easy replacement for lists, when the extensibility of lists is not needed. That is, if I have a collection of stuff that I do not expect to mutate, I would hardly ever reach for an array, as using an array opens me up to all kinds of questions about mutation. But an immutable array -- especially if the syntax is easy -- is pretty great. It's true that an immutable array won't have the For what its worth, we at Jane Street already have 874 uses of immutable arrays within our codebase. There's been no particular encouragement for adoption -- just some advertising saying that the feature is available. Of course, all of this is future-compatible with syntax to support arbitrary mutable and immutable sequences. I don't think imagining those other features should stop us from taking this one. |
This PR proposes to add immutable arrays to OCaml.
This is a feature originally by @antalsz and has been in use for a bit more than six months at Jane Street, in their fork of the compiler with extensions (below dubbed flambda-backend). Jane Street has asked for help from Tarides to port the feature onto OCaml 5.x.
Motivation
Immutable arrays represent an improvement over regular arrays in several situations:
Benefits on trunk
string
andbytes
.As an example use case, @dbuenzli’s uucp and uunf libraries store static Unicode data in some big arrays (among other data structures), themselves comprising sub-arrays. It has proved difficult in the past to compile this efficiently, and some array initialisation remains at runtime, since the optimizer is unable to verify that all the data is constant. Using immutable arrays, the optimizer can be extended to allocate it all statically in a read-only data section of the executable.
In addition, the presence of a standard module:
Benefits on flambda-backend
In addition to the benefits above, immutable arrays work better with locality modes in use at Jane Street: mutable data structures cannot be safely allocated on the stack as it would allow local variables to escape, but immutable arrays and records can.
Design
The Lambda intermediate language already has immutable arrays; therefore, there is little to add. A mutability flag is added to the AST nodes
Pexp_array
andTexp_array
, and a new predefined type'a iarray
is added.Parser support is added for the form
[: e1; e2; …; en :]
in array literals and patterns:A new module
Iarray
— and the correspondingIarrayLabels
— is added to the Stdlib, containing all the functions ofArray
, excluding the mutating functions. The memory representation of immutable arrays is the same as that of mutable arrays, soIarray
reuses the runtime primitives for regular arrays.Documentation is yet to be written: we thought it made more sense to first see what are the opinions about the design.