Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support synthetic source for aggregate_metric_double when ignore_malformed is used #108746

Merged
merged 12 commits into from
May 23, 2024

Conversation

lkts
Copy link
Contributor

@lkts lkts commented May 16, 2024

This PR adds synthetic source support for aggregate_metric_double field when ignore_malformed is used.

This PR introduces a pattern that will be reused in ignore_malformed support in synthetic source for other (complex object-like) fields. The pattern is to create a "shadow" XContentBuilder that replicates all successfully parsed fields and values. In case of malformed data, everything remaining in the parser (inside the field) is copied over to the builder. As a result we get both successfully parsed pieces, malformed piece, and skipped pieces which is a full representation of user input and can go to synthetic source.

Contributes to #90007.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

* Build a {@link BytesRef} wrapping a byte array containing an encoded form
* of the passed XContentBuilder contents.
*/
public static BytesRef encodeXContentBuilder(XContentBuilder builder) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #108417.

@lkts lkts requested review from kkrik-es and martijnvg May 16, 2024 23:09
@@ -506,6 +572,9 @@ private Map<String, Object> randomAggregateMetric() {
private void mapping(XContentBuilder b) throws IOException {
String[] metrics = storedMetrics.stream().map(Metric::toString).toArray(String[]::new);
b.field("type", CONTENT_TYPE).array(METRICS_FIELD, metrics).field(DEFAULT_METRIC, metrics[0]);
if (malformedExample) {
b.field(IGNORE_MALFORMED, true);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some more elaborate use cases, here or in yaml? We want to check for:

  • Nested subfields within aggregate_metric_double
  • Partial aggregate_metric_double (e.g. one metric ok, one invalid, two ok)
  • Nested malformed fields, with other fields at the same level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do cover partials in malformedValue. Nested is a good suggestion.

@lkts lkts changed the title Support synthetic source for aggregate_metric_double when ignore_malf… Support synthetic source for aggregate_metric_double when ignore_malformed is used May 17, 2024
@lkts
Copy link
Contributor Author

lkts commented May 21, 2024

@elasticmachine update branch

metric:
min: 18.2
max: 100
value_count: 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move value_count to the end, to make sure it's still tracked.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left one question for verifying my understanding.

// We don't use DrainingXContentParser since we don't want to go beyond current field
malformedContentForSyntheticSource.copyCurrentStructure(context.parser());
}
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded semicolon?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lkts lkts merged commit a7c3bd5 into elastic:main May 23, 2024
15 checks passed
@lkts lkts deleted the feature/ignore_malformed_synthetic_source branch May 23, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants