Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents randomly stop reconnecting #15

Open
bloudermilk opened this issue Aug 4, 2014 · 1 comment
Open

Agents randomly stop reconnecting #15

bloudermilk opened this issue Aug 4, 2014 · 1 comment

Comments

@bloudermilk
Copy link

We're having issues with agents not reconnecting sometimes after a message error. Here are the relevant log entries:

2014-08-04T20:03:58.212822+00:00 app[web.1]: apnagent:agent-live:web.1.14 [6ms] (gateway) data: 6 bytes
2014-08-04T20:03:58.214107+00:00 app[web.1]: apnagent:agent-live:web.1.14 [2ms] (gateway) disconnected - gateway.push.apple.com:2195
2014-08-04T20:03:58.214248+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (gateway) end
2014-08-04T20:03:58.214862+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (encoder) write: 28273
2014-08-04T20:03:58.215484+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (queue) pause: not connected
2014-08-04T20:03:58.198696+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (gateway) write: 28267
2014-08-04T20:03:58.199419+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (cache) push: 28267
2014-08-04T20:03:58.200158+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (encoder) write: 28268
2014-08-04T20:03:58.200419+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (gateway) write: 28268
2014-08-04T20:03:58.200600+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (cache) push: 28268
2014-08-04T20:03:58.200749+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (encoder) write: 28269
2014-08-04T20:03:58.201017+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (gateway) write: 28269
2014-08-04T20:03:58.201155+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (cache) push: 28269
2014-08-04T20:03:58.201307+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (encoder) write: 28270
2014-08-04T20:03:58.201578+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (gateway) write: 28270
2014-08-04T20:03:58.201718+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (cache) push: 28270
2014-08-04T20:03:58.201865+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (encoder) write: 28271
2014-08-04T20:03:58.202116+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (gateway) write: 28271
2014-08-04T20:03:58.202255+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (cache) push: 28271
2014-08-04T20:03:58.204973+00:00 app[web.1]: apnagent:agent-live:web.1.14 [2ms] (encoder) write: 28272
2014-08-04T20:03:58.205635+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (gateway) write: 28272
2014-08-04T20:03:58.206080+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (cache) push: 28272
2014-08-04T20:04:01.207280+00:00 app[web.1]: apnagent:agent-live:web.1.14 [2992ms] (cache) since: 28263
2014-08-04T20:04:01.217101+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (queue) push: 28271
2014-08-04T20:04:01.215785+00:00 app[web.1]: apnagent:agent-live:web.1.14 [8ms] (queue) push: 28264
2014-08-04T20:04:01.215850+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (queue) push: 28265
2014-08-04T20:04:01.215921+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (queue) push: 28266
2014-08-04T20:04:01.215987+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (queue) push: 28267
2014-08-04T20:04:01.216170+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (queue) push: 28268
2014-08-04T20:04:01.216226+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (queue) push: 28269
2014-08-04T20:04:01.216590+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (queue) push: 28270
2014-08-04T20:04:01.217914+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (queue) push: 28272
2014-08-04T20:04:01.217969+00:00 app[web.1]: apnagent:agent-live:web.1.14 [0ms] (gateway) reconnecting
2014-08-04T20:04:01.218187+00:00 app[web.1]: apnagent:agent-live:web.1.14 [1ms] (gateway) connecting - gateway.push.apple.com:2195

You can see that the gateway receives 6 bytes, then closes the gateway as it should. Next we see the queue pause, which is also expected. Then, two strange things happen. First, we see that after the queue has been paused, the gateway is still being written to, which shouldn't happen. The next strange thing is that although the reconnect attempt happens 3 seconds later, it never finishes connecting or throws an error. That's the last line in our log for that worker after it continued to run for another hour or so. I'm going to experiment with setTimeout and the timeout event on the tls socket to see if that event gets called in this case. Very strange stuff...

Edit: As you can see I added some additional logging, which you can find on my fork if it's of interest.

@bloudermilk
Copy link
Author

I added a connection timeout to my fork, which fixed this issue. Let me know if you'd like a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant