[WIP] Migrate Training Update Process to Sidekiq #5754

ujjwalpathaak · 2024-04-05T17:44:42Z

fixes #4712
This PR aims to migrate the training update process to Sidekiq. So that there are free resources available on the main thread.

Move the update process to Sidekiq
Deal with YamlTrainingLoader & InvalidYamlError error?
"Another training update process is already in progress. Try again later."
Rescue ModuleNotFound error?
Changes to allow addition of new content
Edge Cases
Write Tests

ujjwalpathaak · 2024-04-05T17:48:26Z

Hello @ragesoss, I have tried to move the process to Sidekiq it is working fine. LMKWYT if this seems feasible, I'll continue with this approach to add other features.

ujjwalpathaak · 2024-04-09T07:02:06Z

Hey, I tried a few things to identify if there is a Sidekiq job already running? I am not able to figure out a clean way to do it.

You suggested to handle this on the database level but if I add a boolean column to every training content, will it be a nice way to do things?
As needs_update only updates those courses which have this bool set to true, whereas libraries and modules gets updated on each ./reload request?

ujjwalpathaak · 2024-04-09T07:27:43Z

Few of the options I tried-

Adding "redis" as a gem

using redis.set('training_update_in_progress', true) as a check if an older update is already running? This would need cleanup code in case of unexpected server termination as training_update_in_progress is set false after loading of slides is done by sidekiq.

Get all jobs from the default queue and check if "TrainingBaseWorker" job exists.

This is not consistent ad it takes time for TrainingBaseWorker to show up int he default queue.

We are using the gem 'sidekiq-unique-jobs' as sidekiq_options lock: :until_executed, it has a few options Link to SidekiqUniqueJobs documentation to solve these conflicts

but the problem is that it would not give feedback to the user on client that another job is running but will notify this conflict on server side

ragesoss · 2024-04-09T17:25:13Z

I don't know the best approach here, but one that might work would be:

Include a column for update status on TrainingLibrary and TrainingModule records, but make it able to handle more details than just a boolean state. (It might or might not make sense to include more than one column.) The idea would be to be able to keep track of whether it was not scheduled for an update, scheduled for an update that hasn't started yet, or started an update that has not successfully completed yet (with a timestamp for when it started). So the update process would update that status in the database as soon as it starts, and then we could use some logic like 'if it's been more than X amount of time since the start of an update, check whether an update process is running and if not, it probably errored'.

Then, except for that case where an update failed but the status still shows it being updated, we wouldn't need to check Redis to show state to the user.

ujjwalpathaak · 2024-04-11T17:30:33Z

Hi @ragesoss,
So I tried the approach you suggested, let me know if you suggest any changes in it.
I have added a few tasks in the PR desc that i think are still needed to close this PR, I'll add more as i encounter them.

Here is the working ( not written tests for them till now but it should work fine after i write tests and do minor changes accordingly )

there are 2 new cols for each table - libraries, modules & slides.
- update_status -
  - It will have 3 states
    - 0 -> default / not_scheduled state
    - 1 -> scheduled but update not started state
    - 2 -> update started but not completed
- update_error
  - default / no_error - NULL
  - error - e.message ( that is to be rendered to the user )

For errors like InvalidWikiContentError & NoMatchingWikiPagesFound - all the rows in the table will have the error as this is not slug specific.
For errors like DuplicateSlugError - only the rows which have the errored_slug would contain the message.

After hitting on /reload_trainings a timer (currently for 15min) would run up with all the rows in training_libraries changes their update_status to 1 ( scheduled ).. as we will start to fetch content from wikimedia for a slug it will change state to 2 ( stared updating ) then after calling .inflate on that slug if no errors occur it will again default to 0 showing update complete

if errors occurs it is handled by not setting state of 2 to 0 so we could catch if error occured. After 15min the timer would check for all rows if a row has 2 as update_status then it would render the error stored in update_error of that row to the user

we can also check if a thread of this 15min timer is currently running to see if a update is already running?

I still have a few blockers as to how to deal with yaml content should i entierly exclude it from the new workflow or just add conditions in between and make it work & how to deal with ModuleNotFound error as to where to store if this occurs.

I'll work on them and add new commits as I figure out the issues.

ujjwalpathaak · 2024-04-12T21:21:49Z

@ragesoss I had a doubt.
I was getting a DuplicateSlugError error for "editing-basics" slug while loading it from wiki_pages.
what is the expected behaviour here? should the update_process skip it and go ahead or it should show the error to user and terminal the process?

ragesoss · 2024-04-12T21:31:32Z

That typically happens because you have done training updates in both wiki_education true and false modes, and there is one module with that slug that gets pulled from the wiki (in false mode) and another from the .yml files (in true mode). Delete all the TrainingModule and TrainingSlide records and you should be able to proceed.

ujjwalpathaak · 2024-04-12T21:37:51Z

I wanted to ask how should these erros in prod behave? if an error is raised should it stop the workflow or that slug should be skipped and the workflow should continue.

ragesoss · 2024-04-12T21:39:29Z

Hopefully such errors won't occur in production, but skipping that slug and continuing would be nice

ujjwalpathaak · 2024-04-15T12:02:21Z

Hey @ragesoss,
For errors like NoMatchingWikiPageFound, NoModuleFound and more... I cannot target a single row, currenlty I am solving this problem using a tacky solution but I am not finding that implementation nice.

I was wondering if it is feasible to make a new table to record the errors that are raised during the background update process so we dont need to iterate on all the rows in all 3 tables to do any operations.

Table Name - training_update_status
------ 1. row ------
content_class: TrainingLibrary
update_status: 0 // not_scheduled
error_message: NULL

------ 2. row ------
content_class: TrainingModule
update_status: 1 // scheduled
error_message: NULL

------ 3. row ------
content_class: TrainingSlide
update_status: 2 // error_occured
error_message: could not get links from '#{@wiki_base_page}'

This would reduce the complexity of the workflow + reduce iterations over the dashboard

ragesoss · 2024-04-15T19:49:14Z

I'd rather not add a table just to hold this information. If the idea would be to limit the data to just one entry per record type, perhaps a Setting record could be used to store the status.

ujjwalpathaak · 2024-04-15T19:53:12Z

Okay I'll use setting records to store the data, and I'll start to write tests to test the functionality

…ng_content.rb

ujjwalpathaak · 2024-04-19T19:05:55Z

Hey @ragesoss
I have been trying to tweek a lot of things here and there, but still there are a few loose ends to tie up.
If you could can you have a look and maybe guide me what all should I keep in mind ahead from here in terms of testing and expected behaviour?

This reverts commit b31ceec.

…raining_content.rb

ujjwalpathaak · 2024-04-26T09:49:36Z

Hey @ragesoss,
Most of the functionality is now working for me locally, but there is one issue I need some assistance with.

I am adding a new initializer - WikiEduDashboard/config/initializers/training_update_process_cleanup.rb

require_dependency "#{Rails.root}/lib/data_cycle/batch_update_logging"
require_dependency "#{Rails.root}/lib/training/training_base"

include BatchUpdateLogging

delete_pid_file(:training)
TrainingBase.new.update_process_state(0)
TrainingBase.new.clear_error_messages

This is to basically clean any leftovers from the last run ( in development ), this is working locally for me but when I push this to github I get this error -

Run mkdir tmp -m 777
  mkdir tmp -m 777
  cp config/application.example.yml config/application.yml
  cp config/database.example.yml config/database.yml
  bin/rails db:migrate RAILS_ENV=test
  shell: /usr/bin/bash -e {0}
  env:
    RAILS_ENV: test
    DATABASE_PORT: 3306
rails aborted!
ActiveRecord::StatementInvalid: Mysql2::Error: Table 'dashboard_testing.settings' doesn't exist

am I missing a step or doing something wrong?

ragesoss · 2024-04-29T16:42:31Z

I don't see that error in the CI log. Where during the build is it happening?

ujjwalpathaak · 2024-04-29T17:21:14Z

Commit - e00b7e0
Test Logs for the above commit - https://github.com/WikiEducationFoundation/WikiEduDashboard/pull/5754/checks?check_run_id=24057578970

Started to after this commit

initial commit

f3dabf6

moved import modules & slides to sidekiq

9957c1c

adding ability to track background updates

1a94eb4

ujjwalpathaak marked this pull request as draft April 11, 2024 17:11

Ujjwal Pathak added 2 commits April 13, 2024 03:21

seperated yaml loading from background jobs

5fbeb02

added - check for already running update

a5b4ac7

ujjwalpathaak and others added 5 commits April 17, 2024 15:06

Delete db/migrate/20240410171428_add_update_status_and_time_to_traini…

555d5d7

…ng_content.rb

Delete app/services/check_training_update_status.rb

a529a01

passing old specs

b7a7644

one more spec passing

db393e6

added: fixes

9494a07

Ujjwal Pathak added 6 commits April 20, 2024 20:47

training update process clean up code on server startup

e00b7e0

error handling working

561fb81

added ModuleNotFound support

89714db

removing old migrations

b31ceec

Revert "removing old migrations"

970505b

This reverts commit b31ceec.

removing old migrations_2

0910344

ujjwalpathaak and others added 2 commits April 26, 2024 15:08

Delete db/migrate/20240426093454_remove_update_status_and_time_from_t…

f3f51c7

…raining_content.rb

trying to resolve testing

38707b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Migrate Training Update Process to Sidekiq #5754

[WIP] Migrate Training Update Process to Sidekiq #5754

ujjwalpathaak commented Apr 5, 2024 •

edited

ujjwalpathaak commented Apr 5, 2024 •

edited

ujjwalpathaak commented Apr 9, 2024 •

edited

ujjwalpathaak commented Apr 9, 2024 •

edited

ragesoss commented Apr 9, 2024

ujjwalpathaak commented Apr 11, 2024 •

edited

ujjwalpathaak commented Apr 12, 2024

ragesoss commented Apr 12, 2024

ujjwalpathaak commented Apr 12, 2024

ragesoss commented Apr 12, 2024

ujjwalpathaak commented Apr 15, 2024 •

edited

ragesoss commented Apr 15, 2024

ujjwalpathaak commented Apr 15, 2024 •

edited

ujjwalpathaak commented Apr 19, 2024

ujjwalpathaak commented Apr 26, 2024

ragesoss commented Apr 29, 2024

ujjwalpathaak commented Apr 29, 2024

[WIP] Migrate Training Update Process to Sidekiq #5754

Are you sure you want to change the base?

[WIP] Migrate Training Update Process to Sidekiq #5754

Conversation

ujjwalpathaak commented Apr 5, 2024 • edited

ujjwalpathaak commented Apr 5, 2024 • edited

ujjwalpathaak commented Apr 9, 2024 • edited

ujjwalpathaak commented Apr 9, 2024 • edited

ragesoss commented Apr 9, 2024

ujjwalpathaak commented Apr 11, 2024 • edited

ujjwalpathaak commented Apr 12, 2024

ragesoss commented Apr 12, 2024

ujjwalpathaak commented Apr 12, 2024

ragesoss commented Apr 12, 2024

ujjwalpathaak commented Apr 15, 2024 • edited

ragesoss commented Apr 15, 2024

ujjwalpathaak commented Apr 15, 2024 • edited

ujjwalpathaak commented Apr 19, 2024

ujjwalpathaak commented Apr 26, 2024

ragesoss commented Apr 29, 2024

ujjwalpathaak commented Apr 29, 2024

ujjwalpathaak commented Apr 5, 2024 •

edited

ujjwalpathaak commented Apr 5, 2024 •

edited

ujjwalpathaak commented Apr 9, 2024 •

edited

ujjwalpathaak commented Apr 9, 2024 •

edited

ujjwalpathaak commented Apr 11, 2024 •

edited

ujjwalpathaak commented Apr 15, 2024 •

edited

ujjwalpathaak commented Apr 15, 2024 •

edited