WATCH_MODE info #288

iferencik · 2021-12-07T19:02:04Z

Hello there

I'd like to know if setting WATCH_MODE=true makes martin reload the config file.
I am publishing many postgis layer with even more properties/columns and in order to limit these i need to use a configuration file.
I am thinking to create a python script that will generate the config yaml and it would be really nice if martin could re-read this in WATCH_MODE

thank you for this great software box

sharkAndshark · 2022-07-06T07:35:53Z

If this is on road map,can i try this?

stepankuzmin · 2022-07-06T09:33:10Z

Hi @sharkAndshark,

Sure!

sharkAndshark · 2022-08-10T03:02:26Z

Based on the discusion in #349 ,there will be no watch mode? @stepankuzmin

nyurik · 2022-08-10T03:59:56Z

@sharkAndshark I think we may need to rethink the watch mode, and possibly add it back before the next release.

Refresh should NOT be initiated on GET request, as this is a direct path to DDOSing the service, plus it may get cached by some proxy in between, or the browser
Martin should be configurable with regular refresh intervals, i.e. every 10 minutes
Martin should be configurable to allow a POST request to refresh -- easier to filter with firewall rules, does not get cached by proxies. Configuration should allow a subset of IPs, and only from the localhost by default.
rework refresh using shared mutable state concepts. I think it should be something like "get X without the shared multi-threaded state, unless X does not exist, in which case get a read of a read/write lock, and copy the latest state to the worker's state". Not certain if this is doable though.
Add support for volatile/versioned data sources:
- tilejson should return a slightly different path for the actual tiles. E.g. if the path is /rpc/getmvt.json, the tile path should be /rpc/getmvt/<version>/<z>/<x>/<y>.pbf, where the <version> is 1,2,... This path should be dynamic, and change whenever a new version is detected. Only applies when it is possible to access different versions at the same time. Note that this approach would only work if there is only one instance of Martin, which is not scalable.
- if the backend is an mbtiles (sqlite file) or a PMtiles file, the file could in theory be renamed while the pooled connections still have it open, and a new file is placed in the same location -- so this could work with versions.

That said, I really feel like the more substantial changes should be done with kubernetes and helm and deployments, and the watch mode is a development hack, making infrastructure too fragile. A better set up would be:

set up a kubernetes+helm deployment with multiple redundant instances of Martin
when a new config/data sources are needed, create a new helm deployment version with new configuration (could even automate these steps)
initiate a gradual roll-out of the new version, using automated tests to ensure the roll-out is going on ok, and rolling back to the last stable configuration if it is not

sharkAndshark · 2022-08-11T15:55:11Z

@nyurik I will give it a a shot and then more discussion

nyurik · 2022-08-11T18:38:06Z

Sounds good! @stepankuzmin any thoughts?

stepankuzmin · 2022-08-14T13:21:46Z

@sharkAndshark sure, that would be great 👍

pascal-codetaal · 2022-10-25T07:42:19Z

@nyurik We are currently using the latest version of martin, and there are some real benefits in using this version.
But the only thing that is stil a downside in the new version, is the remove of the watch or the option to reload the index.
We are running on a Kubernetes cluster and we reload the index by creating a new pod, and killing the other one.
But this process can take up to 50 seconds or so, that's the time the user has to wait before, before being able to see the updated data.
So is there a possibility to add another way to refresh the data?
Perhaps we can try to help with implementing this feature?

nyurik · 2022-10-25T13:37:47Z

@pascal-codetaal I'm beginning to think we really do need to reimplement it, and some help would be great. So lets have a dedicated POST /reload endpoint.

My only concern is that we would want to avoid a global lock on each request. I'm not yet certain how to do that best.

iferencik · 2022-10-25T19:38:58Z

@nyurik We are currently using the latest version of martin, and there are some real benefits in using this version. But the only thing that is stil a downside in the new version, is the remove of the watch or the option to reload the index. We are running on a Kubernetes cluster and we reload the index by creating a new pod, and killing the other one. But this process can take up to 50 seconds or so, that's the time the user has to wait before, before being able to see the updated data. So is there a possibility to add another way to refresh the data? Perhaps we can try to help with implementing this feature?

@pascal-codetaal
if you do a

    kubectl rollout restart deployment <depl_name>

there is no downtime. kube will ensure the old pod(s) is/are working while the new one gets deployed.

Just make sure you have enough capacity on the node to spin a new deployment.

Also, you should ideally have more(>1) replicas.

I have the same issues and being under time pressure I resorted to something else
I created a sh [script] (https://github.com/UNDP-Data/martin-docker/blob/main/reactive_entrypoint.sh) that checks the local config config file against a remote one located at a web accessible url (Azure file share).

An env vars specify how often the url/config is pulled and where the URL is pulled from.

This might not be very elegant but it works like a charm specially because i use a tool to create (optionally upload to azure) the config files. We actually update the layers on a daily, sometimes hourly basis. In future i will pool the remote config once a day.

The tool has a quirk that it creates configs only for tables that have the "publish=True" in the table description. The same is valid for columns, if you mark a column with publish=False the toll will skip it.
This is not a very good idea and I think I will scrape it in favor of a more robust technique like revoking read rights for the user under which martin connects to the database.

pascal-codetaal · 2022-10-25T20:20:49Z

We don't have a downtime, it just can take up to 50 seconds before the new pod is up and running...
We let users upload data to our db and they don't like to wait an extra 50 seconds before they can see the updated data. I'm going to check your script, thx

saosebastiao · 2023-04-03T17:35:47Z

I would love to be able to reload the schema config without a full restart. The PostgREST project has two cool ways to do this:

https://postgrest.org/en/stable/schema_cache.html#schema-cache-reloading

listen for a SIGUSR1 signal
Using Postgres' LISTEN/NOTIFY with a dedicated channel

The second method in particular is interesting because it allows the DB admin to set up custom triggers that would send notifications to the custom channel any time a geospatial table changes.

nyurik · 2023-04-03T17:53:57Z

There are currently two issues I see (i'm sure there are more):

Performance - if we introduce a global lock for each request in order to access a singleton configuration, we might not be able to scale efficiently. This might be negligible - will need to evaluate in some simple stress test (e.g. tiny actix app that returns a static string after accessing some global lock var - while under a massive stress test, and the app is limited to just several CPUs)
Source ID stability - when reloading, the auto-generated IDs could end up with a different one, e.g. in case multiple sources have the same table name. Possible solutions:
- add an ability to generate a hash param from an internal "long ID": hash(server_name, server_port, db_name, namespace, table, geo_column), and users can set it with id_format: '{table}.{hash}' in the auto_publish settings.
- add some heuristics to try to match existing IDs to the new IDs based on the internal "long ID"

tordans · 2024-01-25T09:51:04Z

We just ran into this issue after switching from pg_tileserv to Martin. With pg_tileserv all new tables where discovered right away. Now, we have to restart our Martin Docker container to get an updated table list.

Our docker setup https://github.com/FixMyBerlin/atlas-geo/blob/main/docker-compose.yml starts our backend server with db, processing and tiles. It looks like we don't have a good way to restart the tile docker after the app image closed and finished processing our data…

Is there some API like <marinserver.com>/refresh that we could ping to notify Martin to refresh tables?

nyurik · 2024-01-25T18:37:07Z

@tordans we all want that api to happen - I plan to work on it in the future, but no certainty on the timeframe -- it is mostly a personal project, so if anyone has any time to work on it, I will be happy to guide and assist that effort.

pascal-codetaal · 2024-01-25T18:51:49Z

Hello @nyurik and the Martin contributors,

We've been closely following the discussion about the need for a watch mode or similar feature to enable dynamic config reloading in Martin. As we understand the significance of this feature for users, and us, who need to see updated data without substantial downtime, we're keen on contributing to its development.

Though we're not currently well-versed in Rust, we're ready to put in the necessary time to become proficient and contribute meaningfully to this project. We're particularly interested in understanding the preferred approach.

Could you provide us with some guidance on where you believe we should start, or if there are any particular areas of the codebase or documentation that we should familiarize ourselves with first?

We're looking forward to collaborating with the community on this feature. Thank you for considering our offer to help.

nyurik · 2024-01-25T23:08:00Z

@pascal-codetaal I will be happy to help you.

The very first thing I think we should evaluate is the performance aspect outlined in #288 (comment) -- create a tiny Actix app using one of the examples they have as a starting point. Inside that app, create two endpoints, e.g. /lock and /nolock to see the impact of the global lock. You can use the oha tool (used in the justfile) to compare the performance of both endpoints.

This should get you familiarized with the Rust ecosystem, making it much easier to make changes to a bigger project.

#[derive(Clone)]
struct State {
    pub values: HashMap<String, String>,
    pub values_mutex: Arc<Mutex<HashMap<String, String>>>,
}

impl State {
    pub fn new() -> Self {
        let values = vec![(String::from("key"), String::from("value"))]
            .into_iter()
            .collect::<HashMap<String, String>>();
        Self {
            values_mutex: Arc::new(Mutex::new(values.clone())),
            values,
        }
    }
}

#[get("/lock")]
async fn get_lock(state: Data<State>) -> String {
    let vals = state.values_mutex.lock().unwrap();
    vals.get("key").unwrap().clone()
}

#[get("/nolock")]
async fn get_nolock(state: Data<State>) -> String {
    state.values.get("key").unwrap().clone()
}

pascal-codetaal · 2024-01-29T13:07:53Z

@nyurik the development will be done by @mesudBe-Orbit
He will start with the development of the separate Actix application, and he will push this to a public repository.

mesudBe-Orbit · 2024-01-30T11:14:59Z

@nyurik @pascal-codetaal the development is complete, and you can access the tiny application and view the results by following this link: https://github.com/mesudBe-Orbit/actix-lock-stress-tester?tab=readme-ov-file#test-results
In short, under these conditions, it seems there is no significant difference.

nyurik · 2024-01-30T21:43:36Z

awesome news, thanks! So this should make it easier.

The current config system process:

In start(), read config file if given, using current env vars for any variable expansions.
Merge together CLI parameters and the config file (as well as any env vars)

Run .resolve() to match the configuration to the actual state of the database/files, and create a ServerState (tile sources, fonts, etc)

martin/martin/src/bin/martin.rs

Lines 15 to 27 in a864b99

    
           let env = OsEnv::default(); 
        
           let save_config = args.meta.save_config.clone(); 
        
           let mut config = if let Some(ref cfg_filename) = args.meta.config { 
        
               info!("Using {}", cfg_filename.display()); 
        
               read_config(cfg_filename, &env)? 
        
           } else { 
        
               info!("Config file is not specified, auto-detecting sources"); 
        
               Config::default() 
        
           }; 
        
           args.merge_into_config(&mut config, &env)?; 
        
           config.finalize()?; 
        
           let sources = config.resolve().await?;

The resolve method creates cache and an ID Resolver, and uses resolve_tile_sources to process all tile sources in parallel.

martin/martin/src/config.rs

Line 115 in a864b99

pub async fn resolve(&mut self) -> MartinResult<ServerState> {

IdResolver maps "unique full source IDs" (like database.namespace.tablename or even longer) to some short name the user may want, like tablename. If the short name already exists, it will append numbers to it, e.g. tablename.1, but if the same full unique name is passed to IdResolver, it will re-return the same short name as before - so my point above about stable resolution is mostly done - same table will get the same short name.

PostgreSQL resolution is much more involved than pmtiles/mbtiles. It runs these steps in parallel for all configured postgres connections:

create a PgBuilder instance from the PgConfig instance.
in parallel, query PG to discover all existing tables and functions that can be used as sources (e.g. query_available_tables)
for all configured tables/funcs, match them with the ones discovered in the database, and create a new source instance - in parallel (e.g. instantiate_tables)
if auto-discovery is enabled, instantiate the rest of the tables/funcs that have not been matched up with the configuration

martin/martin/src/pg/config.rs

Line 116 in a864b99

    
           pub async fn resolve(&mut self, id_resolver: IdResolver) -> MartinResult<TileInfoSources> {

So one path to implement watch mode:

Change all endpoint functions to accept sources: Data<RwLock<TileSources>> instead of Data<TileSources>, and get a read lock on all of them. Note that RwLock might not be good for our usecase, but it should get us started.
after parsing and resolving Config struct, store it (as Arc<Mutex<Config>>) inside the actix's Data<...>
add a /refresh endpoint that gets Data<Arc<Mutex<Config>>> from Actix
refresh should call resolve to compute new set of sources, and once done, lock and update all tile sources using the write of the RwLock.
the Config::resolve should be changed to preserve IdResolver instance between calls - e.g. move it to constructor?

nyurik · 2024-01-30T21:56:59Z

@mesudBe-Orbit @pascal-codetaal the development is complete, and you can access the tiny application and view the results by following this link: https://github.com/mesudBe-Orbit/actix-lock-stress-tester?tab=readme-ov-file#test-results In short, under these conditions, it seems there is no significant difference.

Please run your code with cargo run --release, and also you should probably increase the number of requests, e.g. aim for about 5 minute run (?)

pascal-codetaal · 2024-02-02T17:24:36Z

@nyurik the 5 minute run had the same result for both endpoints.
We started with the implementation of changing the endpoint functions.

Change all endpoint functions to accept sources: Data<RwLock> instead of Data, and get a read lock on all of them. Note that RwLock might not be good for our usecase, but it should get us started.

nyurik · 2024-02-02T17:33:42Z

@pascal-codetaal sounds good. One other thing - I would recommend creating a Martin fork, create a branch there, and create a draft pull request into martin. Make sure to commit and push your code often (even if it does not compile). This way everyone can see the progress, and make suggestions as you go, without waiting for the full completion, and to correct mistakes early on. This way you will also get the benefit of constant CI checks.

mesudBe-Orbit · 2024-02-07T12:11:29Z

@nyurik we have created a draft PR into martin. you can find it here And commit and push our latest code (the latest commit was about mistakenly added repetition).

nyurik · 2024-02-07T17:47:34Z

@mesudBe-Orbit yes, thanks, i saw it and already commented on that PR. Keep adding things to it (by pushing to the same branch), and once it is ready to be merged, click the Ready to review button. Make sure to either merge main branch into it, or rebase on main and force push.

nyurik · 2024-05-06T04:48:59Z

pmtiles support for this would be good as well. I outlined how it can be done in stadiamaps/pmtiles-rs#40 - and once done, we can use it in Martin too.

stepankuzmin added the enhancement label Feb 13, 2022

sharkAndshark mentioned this issue Jul 8, 2022

Migrate to Actix-web 4 #349

Closed

nyurik mentioned this issue Dec 21, 2022

Support for .pmtiles file format #508

Closed

nyurik mentioned this issue Apr 3, 2023

Add ability to reload schema config/cache without a restart #630

Closed

nyurik mentioned this issue Oct 20, 2023

Make auto-bounds calc non-default #955

Closed

5 tasks

nyurik mentioned this issue Feb 19, 2024

How to handle dynamic sources? #1198

Closed

sharkAndshark mentioned this issue Mar 5, 2024

What happens when a new materialized view / table added to the database? #1219

Closed

sharkAndshark mentioned this issue Apr 11, 2024

if it is possible to load postgre function source automatically #1302

Closed

jjacquelinet mentioned this issue Apr 17, 2024

Affichage de l'ensemble des arrêtés publiés, sur une carte ; démo d'avril 2024, avec Martin Tile Server et MapLibre GL JS MTES-MCT/dialog#734

Closed

nyurik mentioned this issue Apr 24, 2024

dynamic TileJSONs #1314

Open

sharkAndshark linked a pull request May 8, 2024 that will close this issue

Add /refresh to update sources and some configurations #1179

Open

4 tasks

nyurik mentioned this issue May 26, 2024

Loading sprites is very slow with big SVG #1342

Open

nyurik mentioned this issue Jun 5, 2024

Re-scan of MBTiles directory after adding new files #1362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WATCH_MODE info #288

WATCH_MODE info #288

iferencik commented Dec 7, 2021

sharkAndshark commented Jul 6, 2022

stepankuzmin commented Jul 6, 2022

sharkAndshark commented Aug 10, 2022

nyurik commented Aug 10, 2022 •

edited

sharkAndshark commented Aug 11, 2022

nyurik commented Aug 11, 2022

stepankuzmin commented Aug 14, 2022

pascal-codetaal commented Oct 25, 2022

nyurik commented Oct 25, 2022

iferencik commented Oct 25, 2022 •

edited

pascal-codetaal commented Oct 25, 2022 •

edited

saosebastiao commented Apr 3, 2023

nyurik commented Apr 3, 2023

tordans commented Jan 25, 2024 •

edited

nyurik commented Jan 25, 2024

pascal-codetaal commented Jan 25, 2024

nyurik commented Jan 25, 2024 •

edited

pascal-codetaal commented Jan 29, 2024

mesudBe-Orbit commented Jan 30, 2024

nyurik commented Jan 30, 2024

nyurik commented Jan 30, 2024

pascal-codetaal commented Feb 2, 2024

nyurik commented Feb 2, 2024

mesudBe-Orbit commented Feb 7, 2024

nyurik commented Feb 7, 2024

nyurik commented May 6, 2024

WATCH_MODE info #288

WATCH_MODE info #288

Comments

iferencik commented Dec 7, 2021

sharkAndshark commented Jul 6, 2022

stepankuzmin commented Jul 6, 2022

sharkAndshark commented Aug 10, 2022

nyurik commented Aug 10, 2022 • edited

sharkAndshark commented Aug 11, 2022

nyurik commented Aug 11, 2022

stepankuzmin commented Aug 14, 2022

pascal-codetaal commented Oct 25, 2022

nyurik commented Oct 25, 2022

iferencik commented Oct 25, 2022 • edited

pascal-codetaal commented Oct 25, 2022 • edited

saosebastiao commented Apr 3, 2023

nyurik commented Apr 3, 2023

tordans commented Jan 25, 2024 • edited

nyurik commented Jan 25, 2024

pascal-codetaal commented Jan 25, 2024

nyurik commented Jan 25, 2024 • edited

pascal-codetaal commented Jan 29, 2024

mesudBe-Orbit commented Jan 30, 2024

nyurik commented Jan 30, 2024

nyurik commented Jan 30, 2024

pascal-codetaal commented Feb 2, 2024

nyurik commented Feb 2, 2024

mesudBe-Orbit commented Feb 7, 2024

nyurik commented Feb 7, 2024

nyurik commented May 6, 2024

nyurik commented Aug 10, 2022 •

edited

iferencik commented Oct 25, 2022 •

edited

pascal-codetaal commented Oct 25, 2022 •

edited

tordans commented Jan 25, 2024 •

edited

nyurik commented Jan 25, 2024 •

edited