Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transforms] Update transforms at scale doc to add date rounding #108674

Closed
sophiec20 opened this issue May 15, 2024 · 4 comments · Fixed by #109073
Closed

[Transforms] Update transforms at scale doc to add date rounding #108674

sophiec20 opened this issue May 15, 2024 · 4 comments · Fixed by #109073
Assignees
Labels
>docs General docs changes :ml/Transform Transform :ml Machine learning Team:Docs Meta label for docs team Team:ML Meta label for the ML team

Comments

@sophiec20
Copy link
Contributor

sophiec20 commented May 15, 2024

Update Transform docs https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-scale.html#limit-source-query

From
Use an absolute time value as a date range filter in your source query (for example, greater than 2020-01-01T00:00:00) to limit which historical indices are accessed. If you use a relative time value (for example, now-30d) then this date range is re-evaluated at the point of each checkpoint execution.

To

To limit which historical indices are accessed, exclude certain tiers (for example `"must_not": { "terms": { "_tier": [ "data_frozen", "data_cold" ] } }` and/or use an absolute time value as a date range filter in your source query (for example, greater than 2024-01-01T00:00:00). If you use a relative time value (for example, gte now-30d/d) then ensure date rounding is applied to take advantage of query caching and ensure that the relative time is much larger than the largest of `frequency` or `time.sync.delay` or the date histogram bucket, otherwise data may be missed. Do not use date filters which are less than a date value as this conflicts with logic applied at each checkpoint execution and data may be missed.

@sophiec20 sophiec20 added >docs General docs changes :ml Machine learning :ml/Transform Transform Team:ML Meta label for the ML team labels May 15, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label May 15, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@szabosteve szabosteve self-assigned this May 15, 2024
@StephanErb
Copy link

Do not use date filters which are less than a date value as this conflicts with logic applied at each checkpoint execution and data may be missed.

As a user, I find this part confusing. Can you elaborate this a bit more?

@prwhelan
Copy link
Member

prwhelan commented May 20, 2024

Do not use date filters which are less than a date value as this conflicts with logic applied at each checkpoint execution and data may be missed.

As a user, I find this part confusing. Can you elaborate this a bit more?

In other words, do not apply an upper date filter using the "less than" or "less than or equals to" operators, for example lte : now.

Something like this, I think:

                "range" : {
                    "@timestamp" : {
                        "gte" : "now-30d/d",
                        "lte" : "now"
                    }
                }

As part of the checkpointing math for continuous Transforms, Transforms always adds an upper bound date filter to the search request (it cannot detect if one had already been set by the user). That upper date filter is now - time.sync.delay to capture the time since last checkpoint.

Dates will get resolved to nanoseconds-since-epoch at some point within the search action. If there are two upper bounds in the search request, it is possible for the two upper bounds to have different nanosecond values and conflict with one another, even if they are both now - time.sync.delay. Transforms will think it captured all the data for that checkpoint when actually the data was filtered out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :ml/Transform Transform :ml Machine learning Team:Docs Meta label for docs team Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants