Define how to extract the `sourceMappingURL` comment #30

nicolo-ribaudo · 2024-02-01T16:52:59Z

This PR follows the discussion from last week about tc39/source-map#64. It's marked as a draft because I still haven't "translated" the JS part to CSS, but please already check the JS part and express your opinion :)

Rendered version: https://nicolo-ribaudo.github.io/source-map-spec/source-map.html#linking-generated-code

This patch explicitly defines how to extract such comments from JavaScript, CSS and WebAssembly sources.

It defines multiple ways to do so: either by actually parsing the code, or by just going through all the lines of the program looking for what "looks like" a comment. This is so that different implementations can choose what's best for them, depending on whether they are already parsing the code or not.

To ensure consist behavior accross implementations that choose different strategies, the specification enforces additional requirements on tools that append a sourceMappingURL comment to the generated code: the comment must be placed in such a way that all extraction methods yield the same result. This is not an unresonable burden, since if the progeram is syntactically valid, simply adding the comment at the end of the file only potentially followed by other tool-injected comments is enough. This requirement is lifted if the input code given to the tool is already "maliciously crafted", since we would otherwise require tool to go rewrite that code (for example, splitting strings that contain something that looks like a comment).

I have left the CSS extraction method as TODO because first I want to check how do you feel about the JS one. It has the following properties:

It iterates line by line. Implementations can thus optimize it by going through each line in reverse order, and then scanning through its characters from the beginning to the end (which is what a regexp would do).
It expects multi-line comments to actually be in a single line.
It returns the last sourceMappingURL comment (or well, comment-like) found in the source.
It only considers comments after the last piece of code (i.e. it discards any comment found so far every time it sees some non-comment non-whitespace characters).
It has no requirements about what is before a comment. Adding the comment at the end of the file without first ensuring that there is a newline before it is valid.

This patch explicitly defines how to extract such comments from JavaScript, CSS and WebAssembly sources. It defines multiple ways to do so: either by actually parsing the code, or by just going through all the lines of the program looking for what "looks like" a comment. This is so that different implementations can choose what's best for them, depending on whether they are already parsing the code or not. To ensure consist behavior accross implementations that choose different strategies, the specification enforces additional requirements on tools that append a `sourceMappingURL` comment to the generated code: the comment must be placed in such a way that all extraction methods yield the same result. This is not an unresonable burden, since if the progeram is syntactically valid, simply adding the comment at the end of the file only potentially followed by other tool-injected comments is enough. This requirement is lifted if the input code given to the tool is already "maliciously crafted", since we would otherwise require tool to go rewrite that code (for example, splitting strings that contain something that looks like a comment). I have left the CSS extraction method as TODO because first I want to check how do you feel about the JS one. It has the following properties: - It iterates line by line. Implementations can thus optimize it by going through each line _in reverse order_, and then scanning through its characters from the beginning to the end (which is what a regexp would do). - It expects multi-line comments to actually be in a single line. - It returns the last `sourceMappingURL` comment (or well, comment-like) found in the source. - It only considers comments after the last piece of code (i.e. it discards any comment found so far every time it sees some non-comment non-whitespace characters). - It has no requirements about what is _before_ a comment. Adding the comment at the end of the file without first ensuring that there is a newline before it is valid.

gibson042 · 2024-03-20T22:08:07Z

source-map.bs

+        1. [=Collect a sequence of code points=] that are [=white space code points|ECMAScript
+            white space code points=] from |line| given |position|.


Is it a problem that ECMAScript white space is subject to change over time as future Unicode editions change the set of code points in general category "Space_Separator"?

nicolo-ribaudo force-pushed the extract-source-mapping-url branch from a70a804 to 1e39da2 Compare February 1, 2024 16:54

nicolo-ribaudo force-pushed the extract-source-mapping-url branch from 1e39da2 to d422607 Compare February 1, 2024 16:58

gibson042 reviewed Mar 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define how to extract the `sourceMappingURL` comment #30

Define how to extract the `sourceMappingURL` comment #30

nicolo-ribaudo commented Feb 1, 2024 •

edited

gibson042 Mar 20, 2024

		1. [=Collect a sequence of code points=] that are [=white space code points\|ECMAScript
		white space code points=] from \|line\| given \|position\|.

Define how to extract the sourceMappingURL comment #30

Are you sure you want to change the base?

Define how to extract the sourceMappingURL comment #30

Conversation

nicolo-ribaudo commented Feb 1, 2024 • edited

gibson042 Mar 20, 2024

Choose a reason for hiding this comment

Define how to extract the `sourceMappingURL` comment #30

Define how to extract the `sourceMappingURL` comment #30

nicolo-ribaudo commented Feb 1, 2024 •

edited