Queue is never restarted after SSL error #14

bloudermilk · 2014-08-01T21:26:21Z

We're seeing intermittent SSL errors that produce the following log output:

2014-08-01T18:31:59.274859+00:00 app[web.2]: apnagent:agent-live [278ms] (gateway) error: 139718328477472:error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown:../deps/openssl/openssl/ssl/s3_pkt.c:1275:SSL alert number 46
2014-08-01T18:31:59.275014+00:00 app[web.2]: Gateway error [Error: 139718328477472:error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown:../deps/openssl/openssl/ssl/s3_pkt.c:1275:SSL alert number 46

No idea why we're seeing the error, given that the certs work fine 99% of the time. The main issue though is that the agent's queue is never restarted and/or a connection to the gateway is not made, so our application eventually runs out of memory from the queue backing up.

The text was updated successfully, but these errors were encountered:

bloudermilk · 2014-08-01T22:04:05Z

It seems that the agent receives the gateway:close event when this happens (I think, it's hard to tell because we're using Node clustering), so I've just added some code to my server to restart workers when this event is triggered. Is my assumption that the agent isn't automatically reconnected after this event correct?

Edit: I can say with reasonable confidence that the gateway:close events I saw were indeed triggered as part of the SSL failure. The logic in the gateway.close handler depends on connected being set to true for it to reconnect, so the only case it should trigger the agent's gateway:close event is if the tls.connect handler was never called. I am not seeing any unauthorized events in our logs.

bloudermilk · 2014-08-04T22:34:42Z

Restarting the workers as I mentioned above is working for us as a temporary solution, but I have a feeling the logic for the gateway.close handler should be changed so that this case triggers a reconnection instead of the agent being closed.

bloudermilk · 2014-08-08T18:45:39Z

After updating the logic in the gateway.close handler, apnagent can now recover from these SSL errors gracefully. I'm running an updated version on my fork. Let me know if you're interested in a PR.

logicalparadox · 2014-08-08T19:03:56Z

Looked at your fork. Looks great. Is there any way to add a test to simulate this behavior? Otherwise a PR would be greatly appreciated for both #14 and #15

bloudermilk · 2014-08-08T19:07:04Z

@logicalparadox I'll take a look at the test harness and see if I can simulate both! Thanks for the response.

logicalparadox · 2014-08-08T19:10:24Z

Cool, let me know if you have questions. Also, I like that you exposed debug!

olilavoie · 2014-08-21T20:50:24Z

I can tell that we're having the same problem! Push notifications are working when our node app is freshly started but after ~1hour the console can't send any push and we receive a gateway:error with an empty error and msg object.

bloudermilk mentioned this issue Aug 8, 2014

Gateway sporadically stops sending notifications #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue is never restarted after SSL error #14

Queue is never restarted after SSL error #14

bloudermilk commented Aug 1, 2014

bloudermilk commented Aug 1, 2014

bloudermilk commented Aug 4, 2014

bloudermilk commented Aug 8, 2014

logicalparadox commented Aug 8, 2014

bloudermilk commented Aug 8, 2014

logicalparadox commented Aug 8, 2014

olilavoie commented Aug 21, 2014

Queue is never restarted after SSL error #14

Queue is never restarted after SSL error #14

Comments

bloudermilk commented Aug 1, 2014

bloudermilk commented Aug 1, 2014

bloudermilk commented Aug 4, 2014

bloudermilk commented Aug 8, 2014

logicalparadox commented Aug 8, 2014

bloudermilk commented Aug 8, 2014

logicalparadox commented Aug 8, 2014

olilavoie commented Aug 21, 2014