Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass correct OS PID to systemd-notify #664

Closed
michaelklishin opened this issue Feb 29, 2016 · 29 comments
Closed

Pass correct OS PID to systemd-notify #664

michaelklishin opened this issue Feb 29, 2016 · 29 comments

Comments

@michaelklishin
Copy link
Member

See the discussion in #573.

@binarin
Copy link
Contributor

binarin commented Mar 1, 2016

Some testing reveals that there is a problem with guessing MAINPID.
Currently the first clause executes under systemd. But when it's done that way systemd is actually unhappy with:

os:cmd("systemd-notify --ready MAINPID=" ++ os:getpid()),

because it's the top-level shell process PID that should be passed to systemd.

@binarin
Copy link
Contributor

binarin commented Mar 1, 2016

Maybe additional argument for scripts/rabbitmq-server should be introduced - -systemd, in addition to already existing -detached?

binarin added a commit to binarin/rabbitmq-server that referenced this issue Mar 2, 2016
After some investigation I think that even with `sd_notify` MAINPID
detection didn't work 100% percent correctly.

Foreground broker was started through a shell process that was
converting signals into graceful shutdown commands. And it's pid of that
proxy that was to be communicated to systemd. Automatic pid inference
worked in case epmd was already running on machine, otherwise it was epmd
who was chosen as main process.

This commit introduces new environment variable to prevent creation of
such a proxy process. And this var should be specified in unit
file (example added to docs/). New variable is needed due to following
reasons:
- No way to detect whether we are running under systemd or not
- I was thinking about adding command-line switch instead, but there is
  no way to work with "$@" in portable shell script.

Another improvement is writing message to stderr about graceful
shutdown, just to be sure that shutdown was indeed graceful. In case of
systemd it will automatically appear in the journal.

Fixes rabbitmq#664
@dumbbell
Copy link
Member

Hi!

So I'm testing your pull request on Debian Jessie (8.3, systemd), freshly upgraded from Wheezy (sysvinit). As this is the first time I play with systemd, I'm going to post comments which will probably seem obvious to you, @binarin. I just want to do it because it helps me to understand the problem and correctly test the PR. Feel free to correct me what I say doesn't make any sense :-)

  1. First, how our current package (without your patch) behaves with systemd. I installed 3.6.1 using a package downloaded from www.rabbitmq.com. Once the install is complete, the service is automatically started:

    root@rabbit-jpedron-debian1# service rabbitmq-server status
    ● rabbitmq-server.service - LSB: Manages RabbitMQ server
       Loaded: loaded (/etc/init.d/rabbitmq-server)
       Active: active (running) since Thu 2016-03-10 11:04:43 GMT; 58s ago
       CGroup: /system.slice/rabbitmq-server.service
               ├─2693 /usr/lib/erlang/erts-7.2/bin/epmd -daemon
               ├─2723 /bin/sh -e /usr/lib/rabbitmq/bin/rabbitmq-server
               ├─2997 /usr/lib/erlang/erts-7.2/bin/beam.smp ...
               ├─3115 inet_gethost 4
               └─3116 inet_gethost 4
    
    root@rabbit-jpedron-debian1# rabbitmqctl status
    Status of node 'rabbit@rabbit-jpedron-debian1' ...
    [{pid,2997},
     {running_applications,[{rabbit,"RabbitMQ","3.6.1"},
    ...
    

    A unit file was automatically generated by systemd-sysv-generator and written to /run/systemd/generator.late/rabbitmq-server.service:

    # Automatically generated by systemd-sysv-generator
    
    [Unit]
    SourcePath=/etc/init.d/rabbitmq-server
    Description=LSB: Manages RabbitMQ server
    Before=runlevel2.target runlevel3.target runlevel4.target runlevel5.target shutdown.target
    After=remote-fs.target network-online.target
    Wants=network-online.target
    Conflicts=shutdown.target
    
    [Service]
    Type=forking
    Restart=no
    TimeoutSec=5min
    IgnoreSIGPIPE=no
    KillMode=process
    GuessMainPID=no
    RemainAfterExit=yes
    SysVStartPriority=16
    ExecStart=/etc/init.d/rabbitmq-server start
    ExecStop=/etc/init.d/rabbitmq-server stop
    ExecReload=/etc/init.d/rabbitmq-server reload
    

    The service type defaults to forking, which means systemd expects the ExecStart command to fork a process and exit. The main PID could be derived from the PIDFile but this option is not set. If I understand correctly, systemd doesn't care about the notification we send with systemd-notify because the service type is not notify. See https://www.freedesktop.org/software/systemd/man/systemd.service.html#Options.

    I guess that's the reason systemd is unable to track the service processes correctly and therefore, issueing rabbitmctl stop (ie. bypassing systemd) doesn't tell systemd the service is stopped:

    root@rabbit-jpedron-debian1# rabbitmqctl stop
    Stopping and halting node 'rabbit@rabbit-jpedron-debian1' ...
    
    root@rabbit-jpedron-debian1# service rabbitmq-server status
    ● rabbitmq-server.service - LSB: Manages RabbitMQ server
       Loaded: loaded (/etc/init.d/rabbitmq-server)
       Active: active (running) since Thu 2016-03-10 11:04:43 GMT; 16min ago
       CGroup: /system.slice/rabbitmq-server.service
               └─2693 /usr/lib/erlang/erts-7.2/bin/epmd -daemon
    

    In the example above, we see that systemd has no idea which process is the main one. After rabbitmqctl stop, epmd is still running and the service status is active (running). If I kill epmd, the service status becomes active (exited). Any service rabbitmq-server start does nothing. Only service rabbitmq-server stop sets the status correctly, allowing one to restart the service. But as discussed internally, this is acceptable in this situation: the user is not supposed to mess with the service with a mix of official (service(8)) and unofficial (rabbitmqctl) tools.

  2. Installing a package created from @binarin's pull request gives the exact same result out-of-the-box (ie. without installing the example unit file). That's logical because the patch relies on the provided unit file.

Now, I'm going to install the unit file and test further. In the end, the probable conclusion is that our RPM and Debian packages should install this unit file out-of-the-box, but I need to see how to handle both init systems with a single package if possible.

@dumbbell
Copy link
Member

Just a note while I'm on it: Debian official package of erlang-base for Jessie includes a unit file for epmd. However, Erlang Solutions' one doesn't. We may have a service dependency issue here.

@binarin
Copy link
Contributor

binarin commented Mar 10, 2016

epmd dependency isn't a problem - it's implicit because jessie uses socket activation for epmd.

@dumbbell
Copy link
Member

  1. I stopped the service with service rabbitmq-server stop.

  2. I copied rabbitmq-server.service.example to /lib/systemd/system/rabbitmq-server.service. Now, the service is listed when I query all unit files (including disabled ones):

    root@rabbit-jpedron-debian1# systemctl list-unit-files | grep rabbitmq-server
    rabbitmq-server.service                disabled
    

    I double-checked that this entry doesn't come from the one generated by systemd-sysv-generator: before I copied the unit file, there was no rabbitmq-server entry in the list.

  3. Using service rabbitmq-server start directly doesn't improve the situation because the generated unit file is still being used (ie. the Loaded: line in the status still points to the SysV init script).

  4. Doing systemctl daemon-reload is enough to make the real unit file override the generated one (which is removed by the way):

    root@rabbit-jpedron-debian1# systemctl daemon-reload
    
    root@rabbit-jpedron-debian1# service rabbitmq-server status
    ● rabbitmq-server.service - RabbitMQ broker
       Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; disabled)
       Active: inactive (dead)
    

    The service points to the copied unit file.

  5. Let's start it:

    root@rabbit-jpedron-debian1# service rabbitmq-server start
    
    root@rabbit-jpedron-debian1# service rabbitmq-server status
    ● rabbitmq-server.service - RabbitMQ broker
       Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; disabled)
       Active: active (running) since Thu 2016-03-10 12:07:32 GMT; 2s ago
     Main PID: 13532 (beam.smp)
       CGroup: /system.slice/rabbitmq-server.service
               ├─13532 /usr/lib/erlang/erts-7.2/bin/beam.smp ...
               ├─13611 /usr/lib/erlang/erts-7.2/bin/epmd -daemon
               ├─13719 inet_gethost 4
               └─13720 inet_gethost 4
    

    It works and systemd has the correct main PID.

  6. Stopping the service works correctly too. However, starting it again doesn't: service rabbitmq-server start hangs and the service status is activating, even though RabbitMQ finished to start:

    root@rabbit-jpedron-debian1# service rabbitmq-server status
    ● rabbitmq-server.service - RabbitMQ broker
       Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; disabled)
       Active: activating (start) since Thu 2016-03-10 12:10:34 GMT; 1min 6s ago
     Main PID: 13943 (beam.smp)
       CGroup: /system.slice/rabbitmq-server.service
               ├─13943 /usr/lib/erlang/erts-7.2/bin/beam.smp ...
               ├─14025 /usr/lib/erlang/erts-7.2/bin/epmd -daemon
               ├─14133 inet_gethost 4
               └─14134 inet_gethost 4
    
    =INFO REPORT==== 10-Mar-2016::12:10:36 ===
    Server startup complete; 0 plugins started.
    
    Mar 10 12:10:35 rabbit-jpedron-debian1 rabbitmq-server[13943]: ######  ##        /var/log/rabbitmq/rabbit@rabbit-jpedron-debian1-sasl.log
    Mar 10 12:10:35 rabbit-jpedron-debian1 rabbitmq-server[13943]: ##########
    Mar 10 12:10:36 rabbit-jpedron-debian1 systemd[1]: Cannot find unit for notify message of PID 14136.
    Mar 10 12:10:36 rabbit-jpedron-debian1 rabbitmq-server[13943]: Starting broker... completed with 0 plugins.
    Mar 10 12:12:04 rabbit-jpedron-debian1 systemd[1]: rabbitmq-server.service start operation timed out. Terminating.
    Mar 10 12:12:04 rabbit-jpedron-debian1 systemd[1]: Failed to start RabbitMQ broker.
    

    If I understand systemd logs correctly, the READY notification came from PID 14136, which is incorrect. This number looks like a process worked by Erlang though. systemd eventually killed the service.

I will continue to debug this and learn more about systemd.

@dumbbell
Copy link
Member

epmd dependency isn't a problem - it's implicit because jessie uses socket activation for epmd.

Ok, perfect :-) Thanks!

@binarin
Copy link
Contributor

binarin commented Mar 10, 2016

Mar 10 12:10:36 rabbit-jpedron-debian1 systemd[1]: Cannot find unit for notify message of PID 14136.

So looks --pid should be used to prevent this, and it'll set 'MAINPID=%s' automatically. Probably my problem with --pid was that I was testing from a user unit.

@binarin
Copy link
Contributor

binarin commented Mar 10, 2016

It's some sort of permission problems with systemd-notify - looks like sd-notify daemon ignores pid passed. But if I invoke the same command from root shell, startup ends up successfully. I hope there is some permission-related option that'll allow to fix it.

@binarin
Copy link
Contributor

binarin commented Mar 10, 2016

systemd-notify is broken for non-root users - systemd/systemd#2739
So I think NIF dependency should be restored in the master branch for a time being.

@dumbbell
Copy link
Member

Good job finding the reported issue. I tried to debug this with strace(1), but the problem never occurred with. This led me in the direction of a race with the exit of the shell, but I couldn't anything else.

I'm not a fan of the sd_notify NIF because we can't ship it easily in our binary packages, and if we don't ship it, it will be painful to end-users.

Here are two comments:

  1. Instead of introducing $RABBITMQ_RUNNING_UNDER_SYSTEMD, I believe we can rely on the existing $NOTIFY_SOCKET variable which is required by sd_pid_notify(3).

  2. I played with a simple Perl script which does what sd_pid_notify(3) would do: write to the Unix socket pointed to by $NOTIFY_SOCKET. However, in addition, it waits for the service state to not be activating anymore. Here is the script:

    #!/usr/bin/env perl
    # vim:ft=perl
    
    use strict;
    use warnings;
    
    use IO::Socket::UNIX;
    
    my $SERVICE = 'rabbitmq-server';
    my $MAINPID = $ARGV[0];
    my $SOCK_PATH = $ENV{'NOTIFY_SOCKET'};
    
    my $sock = IO::Socket::UNIX->new(
        Type => SOCK_DGRAM(),
        Peer => $SOCK_PATH,
    );
    die "Could not create socket: $!\n" unless $sock;
    
    print $sock "READY=1\nMAINPID=$MAINPID\n";
    
    close($sock);
    
    sub is_activating($) {
        my ($service) = @_;
    
        my $cmd = 'systemctl show --property=ActiveState '.$service;
        my $state = `$cmd`;
        chomp($state);
    
        $state eq 'ActiveState=activating';
    }
    
    exit 0 unless is_activating($SERVICE);
    
    do {
        sleep(1);
    } while (is_activating($SERVICE));
    
    exit 0;

@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

I was thinking about Perl myself, but wasn't sure it would be acceptable solution 😄

Should I amend this PR with your suggestions from above or are you going to do it yourself?

@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

Another alternative to perl is using socat and erlang:open_port.

I'm using following implementation of systemd-notify and running start/stop in a loop works completely fine:

echo -en "MAINPID=$PPID\nREADY=1\n" | socat unix-sendto:$NOTIFY_SOCKET STDIO

@michaelklishin
Copy link
Member Author

@binarin is socat available out of the box on popular systemd-based distros? I assume that's the case in practice with Perl?

@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

socat is readily available on centos7 and on every supported version of debian and ubuntu. At least on centos7 perl is not a mandatory package, and the difference in package size between perl and socat is 30-fold. I think we should stick with socat.

@michaelklishin
Copy link
Member Author

@binarin I agree.

@dumbbell
Copy link
Member

I tried socat(1) too yesterday and it worked fine in my tests, but I guess we could still hit the problem. Here is the paragraph from the systemd issue (systemd/systemd#2737) I'm referring:

This is because the the manager decides which units it applies to based on the cgroup string. And it decides the cgroup string by looking at /proc/${sending_pid}/cgroup, which won't exist anymore if the sending process gets cleaned up before systemd gets to handling the message.

About the availability, socat is obviously available as a package on all distributions, but it's not installed by default. Our RPM/Debian package can depend on it of course.

@dumbbell
Copy link
Member

About the footprint on disk: Perl is already installed on the Debian Jessie VM I used to test.

@dumbbell
Copy link
Member

By the way, we may have another issue waiting for us after this systemd bug: systemd kills the service if it didn't report readyness in time. However, RabbitMQ may take several dozen minutes to load queue data or sync mirrored queues, all this before reporting "I'm ready!" to systemd.

I don't know if we can disable this timeout from systemd for rabbitmq-server (probably not). I saw that the notify protocol allows regular status report in the form of STATUS=<arbitrary text> on the same Unix socket. I have no idea yet if this resets systemd timers, but it's worth investigating.

@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

  1. Here is my glorified systemd-notify 😄 using socat - it avoids problem of early process exit

    #!/bin/bash -eux
    TEMP=$(getopt -o "" --long ready,pid: -n "$0" -- "$@")
    eval set -- "$TEMP"
    
    while true; do
    case "$1" in
        --ready) shift;;
        --pid) PID=$2; shift 2 ;;
        --) shift; break;;
        *) echo "Internal error on arg $1"; exit 1 ;
    esac
    done
    
    (
    echo -en "MAINPID=$PID\nREADY=1\n"
    while [[ $(systemctl show --property=ActiveState rabbitmq-server) == 'ActiveState=activating' ]]; do
        sleep 1
    done
    ) | socat unix-sendto:$NOTIFY_SOCKET STDIO
    

    I'm going to test whether the same approach will work with erlang:open_port.

  2. CentOS 7 doesn't have perl installed by default, and here we have difference between 300k of socat and more than 10M of perl.

  3. Default start timeout is 90 seconds, some reasonable value should be choosen instead and specified in unit file, e.g.TimeoutStartSec=3600

@dumbbell
Copy link
Member

Nice! Your solution is better. I didn't think of making the state check in the pipe itself to maintain socat up.

A few comments:

  • Could you please get rid of all Bash extensions, such as [[ ... ]]?
  • In the same "spirit", please use printf(1) instead of echo where -e and -n are not portable; I know this script is Linux-specific, but it's just to keep all scripts consistent.
  • Instead of putting set flags on the shebang line, I prefer to have an explicit set -eux line in the script itself: if I run the script using /bin/sh <script> instead of ./script, the flags are still set.

About the startup timeout, I saw the TimeoutStartSec and TimeoutStopSec service options. On this Debian, they both defaults to 90 seconds. It's possible to set them to infinity to effectively disable the timeout. It may be the sanest choice for RabbitMQ.

@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

Next funny finding - if epmd was not running initially, it'll send acknowledgement to systemd itself. It will result in marking rabbitmq as running prematurely.
WAT?

binarin added a commit to binarin/rabbitmq-server that referenced this issue Mar 11, 2016
Hopefully this patch will fix all systemd-related problems:
- Proxy shell process (which converts signals to `rabbitmqtl stop`) will
  no longer be started under systemd. There is no need in it, as systemd
  unit already contains instructions for graceful shutdown.
- Ready notification for systemd will be sent with the help of `socat`,
  as `systemd-notify` is currently broken for non-root users. `socat` is
  the most lightweight way to do it (other options are using NIF or some
  external helper in Perl).
- epmd will not be able to interfere by sending it's own ready
  notifications.
- systemd journal will contain some additional messages about startup
  and shutdown sequence, just to be sure that everything is working
  correctly.

Fixes rabbitmq#664
binarin added a commit to binarin/rabbitmq-server that referenced this issue Mar 11, 2016
Hopefully this patch will fix all systemd-related problems:
- Proxy shell process (which converts signals to `rabbitmqtl stop`) will
  no longer be started under systemd. There is no need in it, as systemd
  unit already contains instructions for graceful shutdown.
- Ready notification for systemd will be sent with the help of `socat`,
  as `systemd-notify` is currently broken for non-root users. `socat` is
  the most lightweight way to do it (other options are using NIF or some
  external helper in Perl).
- epmd will not be able to interfere by sending it's own ready
  notifications.
- systemd journal will contain some additional messages about startup
  and shutdown sequence, just to be sure that everything is working
  correctly.

Fixes rabbitmq#664
@dumbbell
Copy link
Member

Nice patch, thank you! I didn't test it yet, but is there any benefit from keeping support for the NIF?

@michaelklishin
Copy link
Member Author

Compatibility. The NIF was never a requirement but lets keep it around for another feature release or so.

On 11 mar 2016, at 18:30, Jean-Sébastien Pédron notifications@github.com wrote:

Nice patch, thank you! I didn't test it yet, but is there any benefit from keeping support for the NIF?


Reply to this email directly or view it on GitHub.

@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

I'm still debugging one more issue with this patch, so please don't merge even if it passes your tests )

Initially idea was to add systemd-notify as a fallback for NIF in stable branch, and to get rid of NIF completely only in master. So as not to cause any pains for users of the current stable version.

binarin added a commit to binarin/rabbitmq-server that referenced this issue Mar 11, 2016
Hopefully this patch will fix all systemd-related problems:
- Proxy shell process (which converts signals to `rabbitmqtl stop`) will
  no longer be started under systemd. There is no need in it, as systemd
  unit already contains instructions for graceful shutdown.
- Ready notification for systemd will be sent with the help of `socat`,
  as `systemd-notify` is currently broken for non-root users. `socat` is
  the most lightweight way to do it (other options are using NIF or some
  external helper in Perl).
- epmd will not be able to interfere by sending it's own ready
  notifications.
- systemd journal will contain some additional messages about startup
  and shutdown sequence, just to be sure that everything is working
  correctly.

Fixes rabbitmq#664
@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

Now the patch is ready. Unless I've missed something this time )

@michaelklishin michaelklishin modified the milestones: 3.6.2, 3.6.x Mar 11, 2016
@binarin
Copy link
Contributor

binarin commented Mar 11, 2016

Added port_close calld and more explanations about usage of socat.

@binarin
Copy link
Contributor

binarin commented Mar 16, 2016

Testing uncovered some more issues, fixed them also.

@dumbbell
Copy link
Member

I merged the pull request. Thank you!

openstack-gerrit pushed a commit to openstack/openstack-ansible-rabbitmq_server that referenced this issue May 25, 2016
Change the default install version of RabbitMQ Server to 3.6.2.
Tests have been updated to upgrade from 3.6.1 to 3.6.2 and an unused
test vars folder has been removed.

socat has been added as a new dependency to work around a known issue
with systemd: rabbitmq/rabbitmq-server#664

Change-Id: I2a4d127a852e7780dd95f905f3e1d5d1df60028a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants