Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TSDS Data stream] It is possible to delete the write index of a TSDS data stream #108722

Open
gmarouli opened this issue May 16, 2024 · 11 comments
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team

Comments

@gmarouli
Copy link
Contributor

Elasticsearch Version

8.13

Installed Plugins

No response

Java Version

bundled

OS Version

not relevant

Problem Description

Expected behaviour

As a user I should not be able to delete the write index of a data stream, so I can always be able to write to it.

Current behaviour

In the case of a TSDS data stream, there is a period right after a rollover during which the user can still write on the just rolled over index. However, now it's possible for a user to delete this write index because it's not the last one and then all the writes will fail until the newer index becomes the write index.

Steps to Reproduce

Create a tsds data stream

PUT _index_template/my-tsds-test
{
  "index_patterns": [
    "my-tsds-test-*"
  ],
  "data_stream": {
  },
  "template": {
    "settings": {
      "index.mode": "time_series",
      "number_of_replicas": 0
    },
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword",
          "time_series_dimension": true
        },
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time"
        }
      }
    }
  }
}

PUT my-tsds-test-index/_bulk
{ "create":{ } }
{ "@timestamp": "2024-05-16T12:47:00.000Z", "name": "something ok" }

Execute a rollover

POST my-tsds-test-index/_rollover

Try to index again

Now we have two indices [xxx-000001, xxx-000002]. The following document will end up in the first index xxx-000001.

PUT my-tsds-test-index/_bulk
{ "create":{ } }
{ "@timestamp": "2024-05-16T12:48:00.000Z", "name": "something ok" }

Try to delete xxx-000001

DELETE .ds-my-tsds-test-index-2024.05.16-000001

#Response
{
  "acknowledged": true
}

Try to index again

PUT my-tsds-test-index/_bulk
{ "create":{ } }
{ "@timestamp": "2024-05-16T12:49:00.000Z", "name": "something ok" }

# Response
{
  "errors": true,
  "took": 0,
  "items": [
    {
      "create": {
        "_index": "my-tsds-test-index",
        "_id": null,
        "status": 400,
        "error": {
          "type": "illegal_argument_exception",
          "reason": "the document timestamp [2024-05-16T12:49:00.000Z] is outside of ranges of currently writable indices [[2024-05-16T12:49:52.000Z,2024-05-16T13:19:52.000Z]]"
        }
      }
    }
  ]
}

This time indexing the document fails because the correct write index has been deleted.

Logs (if relevant)

No response

@gmarouli gmarouli added >bug needs:triage Requires assignment of a team area label :Data Management/Data streams Data streams and their lifecycles and removed needs:triage Requires assignment of a team area label labels May 16, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@dakrone
Copy link
Member

dakrone commented May 16, 2024

This is an interesting situation, for example, TSDS don't really have a "write index" because documents are routed to the backing index to which their @timestamp corresponds. So even deleting an index that is not the latest generation index (which would normally be the write index for a non-TSDS) means that some subset of documents can't be indexed (if their @timestamp were to route them to that index).

We probably still want to fix this behavior and disallow deleting the latest generation index still, but we should be clear in our docs that deleting any index can still generate the the document timestamp […] is outside of ranges of currently writable indices [[…,…]] exception for a TSDS.

@consulthys
Copy link
Contributor

...means that some subset of documents can't be indexed (if their @timestamp were to route them to that index).

That's a good enough reason to prevent it IMHO... unless there's an easy way for the user to recreate it

...disallow deleting the latest generation index...

I think that for both classic and time-series data streams, it is already not possible to delete the latest generation index. However, for TSDS, (correct me if I'm wrong) it is possible to delete the previous backing indexes that are still writable (i.e. up to the configured index.look_back_time). This should not be allowed for the same reason it is not allowed to delete the latest generation index.

@gmarouli
Copy link
Contributor Author

gmarouli commented May 17, 2024

@dakrone you are right. I will rephrase the ticket to not use the write index terminology.

What about:

That the user cannot delete the backing indices whose timeframe includes "now", this "now" should align with the "now" used when determining the timeframes of the indices.

The idea behind this is that since we expect TSDS to accept current data, we should protect the user from accidentally deleting what we expect to be the most written index.

Thoughts?

@dakrone
Copy link
Member

dakrone commented May 17, 2024

I think that for both classic and time-series data streams, it is already not possible to delete the latest generation index. However, for TSDS, (correct me if I'm wrong) it is possible to delete the previous backing indexes that are still writable (i.e. up to the configured index.look_back_time). This should not be allowed for the same reason it is not allowed to delete the latest generation index.

Why not? If a user were to configure a 7d look back time, with a 3-day retention, would we want to prevent them from doing that?

The idea behind this is that since we expect TSDS to accept current data, we should protect the user from accidentally deleting what we expect to be the most written index.

I agree about preventing deletion of the most-written index, as I think it would lead to a poor user experience. I don't know yet whether we should protect all writeable indices from deletion, given that with a maximum 7d lookback that could be a very large number of indices (for high-volume indices rolling over every hour, for instance).

@consulthys
Copy link
Contributor

The thinking with DS (whether classic or TS) is that deletions (should) occur by means of retention settings (either via ILM or DS lifecycle). Even though it "could" make sense for the user to manually delete the N oldest indexes (whether writable or not) in order to free storage (or whatever else reason), they could achieve the same result by adjusting their retention settings.

However, deleting an index that is inside the array of indices makes much less sense, and even more so if it is writable. Say you have Index1 (oldest, not writable), Index2 (second oldest, still writable), Index3 (newest index, writable), it would not be natural for a user to manually delete Index2. That's what I think we should prevent.

@dakrone
Copy link
Member

dakrone commented May 17, 2024

Say you have Index1 (oldest, not writable), Index2 (second oldest, still writable), Index3 (newest index, writable), it would not be natural for a user to manually delete Index2. That's what I think we should prevent.

Interestingly though, all backing indices in a data stream are writeable (TSDS or not), and a user may or may not be performing writes/updates/deletes to these backing indices (we tell users that need to do updates to do this). I think it's worth us (the team, I mean) discussing whether we want to allow "donut hole" indices, as you mentioned, and how we would do this technically. For instance, it's still perfectly valid in your scenario for a user to delete Index2 because they intend to restore it from a snapshot.

@consulthys
Copy link
Contributor

I like your "donut hole" metaphor :-)
Your explanation makes sense, indeed (re snapshot restore). I still think this kind of actions should be gated in a way that the user should "confirm" that they really intended to delete that backing index (maybe via query string parameter), much like it's the case with other "costly" APIs where we ask the user to add a specific parameter to confirm their intent.

@consulthys
Copy link
Contributor

Interestingly though, all backing indices in a data stream are writeable (TSDS or not),

Circling back to this and looking at the documentation for Data stream lifecycle, in step 3 I can read that the write index that's been rolled over is automatically tail merged.

I'm curious to know whether this tail merging process happens only once after rollover or whether there is some kind of write detection mechanism that will rerun the tail merge again after "some" write operations?

The thinking being that since those indexes are not supposed to be written to anymore, they are tail-merged for optimization's sake, but if they are being written to again after that merge process, they are potentially back into a sub-optimal state.

@dakrone
Copy link
Member

dakrone commented May 24, 2024

I'm curious to know whether this tail merging process happens only once after rollover or whether there is some kind of write detection mechanism that will rerun the tail merge again after "some" write operations?

It happens only once, then the index has an internal flag set to avoid it being rerun in the future.

The thinking being that since those indexes are not supposed to be written to anymore, they are tail-merged for optimization's sake, but if they are being written to again after that merge process, they are potentially back into a sub-optimal state.

Looking at the code, it appears that we do not wait for it to exit the write "window" before force merging it, so older documents could be written during this time. I'll open an issue for us to change this.

@dakrone
Copy link
Member

dakrone commented May 24, 2024

I opened #109030

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants