-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: DOWNLOAD
- a new command for cache-aware downloads
#3948
Comments
Incidentally, I think that Buildkit has some sort of support for this kind of "source" (the term that buildkit uses for cache-aware inputs like this), because Dockerfiles used to have the ADD command which could download and unzip archives. So implementation for this shouldn't be too difficult. |
Feedback on the questions:
|
DOWNLOAD
- a new command for cache-aware downloads
I agree, I think what I meant is that the name/functionality of the flag might be different depending on the behavior. |
I would probably use this if it were included. I like how Bazel's HTTP rules handle this. The cache key is effectively a combination of the "canonical ID" and the file checksum. The "canonical ID" may be set explicitly, and defaults to the URL. It is typical and recommended to include the expected checksum for the file to ensure the result is deterministic, and to mitigate the security risk. I would like it if |
Here's an example of an Earthly target that implements something like
|
Use case
As a developer, I often need to download files as part of my build.
This is can be done by executing curl or wget (
RUN curl <url>
).However, since on the one hand the url might be constant, and on the other hand the file in the remote server might have changed, the
RUN
command will (by default) be cached, meaning the build might not be using the most recent version of the file as the developer intends.An alternative behavior is to always force downloading the file by using
RUN --no-cache
, however this is inefficient since the file can be quite large and/or because the cache might get busted for subsequent steps in the target.It would be good to introduce functionality in the Earthfile syntax that will support cache-aware downloads out of the box, similarly to how docker images pulls are aware of digests in a
FROM <image>
statement,GIT CLONE
is aware of commit hashes, and howCOPY
can tell if a a file in the build context have changed.Expected Behavior
To accomplish the above, we can introduce a new command -
DOWNLOAD
, which, under the hood, can utilize If-Modified-Since or If-None-Match headers to fulfill a conditional GET request under (This is provided the server maintains these tags).If the above mentioned tags are not maintained by the server, a warning will printed to the user and the cache behavior would fallback to "always cached", similarly to how it's done today in
RUN curl
as described above (See--no-cache
flag description below on changing this behavior).Additionally, using
earthly --verbose
flag should display the tag values in the request and the response.For example (How to use aws-cli):
When the developer builds
+run-aws
,DOWNLOAD
command will ensure the file is only download if it hasn't been downloaded. before or if the file did not update since the last time it was downloaded. This means that ultimatelyRUN aws-cli
will be invalidated only when a new version of the executable is used.For comparison,
+run-aws-old
cache will not be invalidated unless--no-cache
is used so the execution ofaws-cli
might not always get invalidated.An additional possible benefit of this new feature is that the developer would not need to install a tool like
curl
before attempting to download a file.Some future/nice to have flags:
--token <your-token>
- pass authentication token in the request header for private servers.--output <file-path>
- where the file should be downloaded to (default: working directory)--chmod
- it's common for a downloaded file to get executed. With this flag, no additionalRUN chmod <your-file>
command is needed.--no-cache
- force redownload of the file similarly to how this flag works in aRUN
command.Open Questions:
--no-cache
flag always force redownloading of a file, or only in case the server does not support relevant tags?The text was updated successfully, but these errors were encountered: