Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement better ipc communication to ghost run process #472

Closed
acburdine opened this issue Sep 4, 2017 · 3 comments · Fixed by #612
Closed

Implement better ipc communication to ghost run process #472

acburdine opened this issue Sep 4, 2017 · 3 comments · Fixed by #612

Comments

@acburdine
Copy link
Member

One thing that can be somewhat confusing currently when using the systemd process manager in the CLI is the fact that ghost start exits quickly with the message that ghost has started, when in fact it can sometimes take 20-30 seconds to actually start (time may vary from server to server based on a number of factors). Conversely, when you run ghost start using the local process manager, it doesn't exit as quickly, but when it does exit, ghost has started.

The reason for this difference is that the local process manager spawns the ghost run subprocess directly, and thus can communicate with it over an IPC tunnel, exiting once ghost has fully started. Systemd, on the other hand, must go through the intermediary process of systemctl, which has no way of letting the ghost start process know once the process has completely started. Other process manager extensions may also have the same issue depending on how they're implemented and how the underlying process manager functions.

To make things more consistent between process managers, the CLI should use a more advanced method of ipc tunneling (using something like https://www.npmjs.com/package/node-ipc).

Note: this would also potentially open up opportunities for improvement in other areas: for example, ghost update could interface with the ghost run process and tell it to run in maintenance mode before modifying the database.

@kirrg001
Copy link
Contributor

kirrg001 commented Jan 30, 2018

To nail down the problem of this issue, i would like to sum this up a little.

  • systemctl start [service] triggers the target systemd service file
  • the systemd service file triggers ghost run
  • ghost run spawns a process to ghost
  • the CLI waits till Ghost is started (IPC communication)
  • but systemd in the meanwhile does not get a notification if the process was started via the CLI or died
  • and that's why we have no real-time result of how long the process needed to start (of if it wasn't even started)

So the CLI must notify systemd.

Could be useful:

From https://www.freedesktop.org/software/systemd/man/systemd.service.html

Behavior of notify is similar to simple; however, it is expected that the daemon sends a notification message via sd_notify(3) or an equivalent call when it has finished starting up. systemd will proceed with starting follow-up units after this notification message has been sent. If this option is used, NotifyAccess= (see below) should be set to open access to the notification socket provided by systemd. If NotifyAccess= is missing or set to none, it will be forcibly set to main. Note that currently Type=notify will not work if used in combination with PrivateNetwork=yes.

We are using type simple at the moment, which is described as following:

If set to simple (the default if neither Type= nor BusName=, but ExecStart= are specified), it is expected that the process configured with ExecStart= is the main process of the service

So using type notify will help us get the real-time start of the Ghost process.

You can proof this by:

  • change the type to notify and run sudo systemctl start ... - the command line will stuck.

I've tested sd-notify, this works, but we have to find a JS implementation. Have not found a working lib.

With notify we can also send status messages, which is great.

Edited:

@acburdine
Copy link
Member Author

So I just tried that node-systemd-daemon library (it wasn't around when I first tried messing around with the notify stuff), unfortunately it's got an optional dependency that requires make to be installed. There was a TODO comment to fallback to the systemd-notify shell command - however this shell command won't work because of systemd/systemd#2739. The gist of that issue is that the systemd-notify command has to be run as root for things to work correctly. (I ran into the same issue when I originally was writing the systemd extension - the type=notify thing was something I wanted to do early-on but ended up abandoning because it was too complicated to get working correctly.)

@kirrg001
Copy link
Contributor

kirrg001 commented Feb 2, 2018

Here is a little summary why we should not use systemd-notify.

C binding sd_notify

Almost all available npm libs use a C binding to sd_notify. As we know C bindings can be very uncomfortable when switching node versions (depending on the ABI) or they require the user to install a compiler. We would like to avoid that.

C binding abstract_socket

Some use the C binding abstract_socket to directly communicate with the systemd notification socket. Node dropped their support for communicating to abstract sockets (unix dgrams). So to communicate with the systemd notify socket directly, we would have to use a C binding. Same reasoning as above.

Execute systemd-notify via shell

This is not reliable, see and see. To sum up: if we trigger the systemd-notify shell cmd, it will send the notification to systemd and the sending process immediately dies then. Systemd does an authentication against the process which has send the event. It won't find it anymore and ignores the message. Adding a sleep to keep the process alive is very hacky. Take a look at the conversation here and a closed PR here.

Furthermore, triggering a shell command requires us to use NotifyAccess=all in any systemd service file, because by default systemd only accepts notifications from the main process (ghost run). With NotifyAccess=all you can send a notification from every cgroup process. That means in theory other processes can shutdown blogs (sd_notify(0, "STATUS=shutdown");)


I looked at how other programming languages or projects handle this problem with systemd.

  • this is a good article which sums up the problems with Node.JS and system notify
  • other programming languages support abstract unix sockets (e.g. python)
  • some use systemd Type:forking in their systemd unit, but this won't work for the CLI - ghost run does not die

Other options:

  1. poll systemctl status {..}. Not reliable, because systemd will turn the status to "active" too early, because Ghost has not even finished booting.
  2. tail and parse Ghost log's - hard and tricky to implement. That would make the feature dependent on Ghost's output.
  3. poll until the instance is awake (check port). Not reliable (enough) because the port will be available before the Ghost bootstrap has finished. To be 100% sure, we also would have to check the Ghost logs (see (2.))
  4. use IPC between ghost start and ghost run (originally suggested in this issue). When Ghost sends the started event back to the CLI, Ghost has fully bootstrapped. So this approach can work. This has the advantage that we can transmit all the data we need (errored? context? etc). Also, this has the advantage that it's independent from the used extension (systemd, supervisor). Furthermore the IPC communication can be in theory useful for other features, because it allows us to open a tunnel between Ghost CLI processes. No security concern, we control the code and as long as the receiver (the CLI) does not actively do (kill process, restart etc) anything dangerous, we don't have to worry about vulnerabilities.

I'll play with (4) now.

Edited: (4) does not work, because of permission problems. ghost run is started with the ghost user and ghost start is started with your ubuntu user. You can't share a unix socket.

Conclusion: Polling to the target Ghost port. If it fails, we grab the logs and show them.

kirrg001 added a commit to kirrg001/Ghost-CLI that referenced this issue Feb 3, 2018
closes TryGhost#472

- has no tests yet
- PR for last dicussions

[ci skip]
kirrg001 added a commit to kirrg001/Ghost-CLI that referenced this issue Feb 3, 2018
closes TryGhost#472

- has no tests yet
- PR for last dicussions

[ci skip]
kirrg001 added a commit to kirrg001/Ghost-CLI that referenced this issue Feb 4, 2018
closes TryGhost#472

- added port polling utility
- general process manager class offers `ensureStarted` function
- systemd extension makes use of `ensureStarted`
kirrg001 added a commit to kirrg001/Ghost-CLI that referenced this issue Feb 4, 2018
closes TryGhost#472

- added port polling utility
- general process manager class offers `ensureStarted` function
- systemd extension makes use of `ensureStarted`
kirrg001 added a commit to kirrg001/Ghost-CLI that referenced this issue Feb 4, 2018
closes TryGhost#472

- added port polling utility
- general process manager class offers `ensureStarted` function
- systemd extension makes use of `ensureStarted`
kirrg001 added a commit to kirrg001/Ghost-CLI that referenced this issue Feb 4, 2018
closes TryGhost#472

- added port polling utility
- general process manager class offers `ensureStarted` function
- systemd extension makes use of `ensureStarted`
acburdine pushed a commit that referenced this issue Feb 4, 2018
closes #472
- added port polling utility
- general process manager class offers `ensureStarted` function
- systemd extension makes use of `ensureStarted`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants