Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many sockets of Ubuntu with 'CLOSE_WAIT' #2440

Closed
mike442144 opened this issue Oct 28, 2016 · 33 comments · May be fixed by request/tunnel-agent#21
Closed

Too many sockets of Ubuntu with 'CLOSE_WAIT' #2440

mike442144 opened this issue Oct 28, 2016 · 33 comments · May be fixed by request/tunnel-agent#21
Labels

Comments

@mike442144
Copy link

mike442144 commented Oct 28, 2016

request version: 2.76.x
system: Ubuntu

I'm using request via proxy with TLS, and after a long time I have thousands of socket ports in CLOSE_WAIT status, I google it , and find maybe the remote server force close my socket would lead to this, but what I want is to detect the event and close the socket gracefully or the ports will be exhausted and the process would never work.

Dose this related to #2438 ?

Thanks.

@mike442144
Copy link
Author

Anybody can help? Because the proxy server would close my socket when I requested too fast so it is necessary to close the socket connection gracefully to prevent ports number increasing on and on.Now I got thousands of ports with CLOSE_WAIT status.

@mike442144
Copy link
Author

I find the code from request.js below, and my error code is ECONNRESET. I'm not sure below code would close the request socket?

Request.prototype.onRequestError = function (error) {
  var self = this
  if (self._aborted) {
    return
  }
  if (self.req && self.req._reusedSocket && error.code === 'ECONNRESET'
      && self.agent.addRequestNoreuse) {
    self.agent = { addRequest: self.agent.addRequestNoreuse.bind(self.agent) }
    self.start()
    self.req.end()
    return
  }
  if (self.timeout && self.timeoutTimer) {
    clearTimeout(self.timeoutTimer)
    self.timeoutTimer = null
  }
  self.emit('error', error)
}

@simov
Copy link
Member

simov commented Nov 1, 2016

/cc @mscdex

@mscdex
Copy link
Contributor

mscdex commented Nov 1, 2016

It doesn't sound like this is related to #2438.

@mike442144 Does downgrading to request 2.75.0 fix the issue?

@mike442144
Copy link
Author

@mscdex I've downgraded to 2.69.0, there also have thousands of CLOSE_WAIT sockets.
I have read tunnel-agent code and request code, below is the relevant code segment:

tunnel-agent/index.js

  function onConnect(res, socket, head) {
    connectReq.removeAllListeners()
    socket.removeAllListeners()

    if (res.statusCode === 200) {
      assert.equal(head.length, 0)
      debug('tunneling connection has established')
      self.sockets[self.sockets.indexOf(placeholder)] = socket
      cb(socket)
    } else {
      debug('tunneling socket could not be established, statusCode=%d', res.statusCode)
      var error = new Error('tunneling socket could not be established, ' + 'statusCode=' + res.statusCode)
      error.code = 'ECONNRESET'
      options.request.emit('error', error)
      self.removeSocket(placeholder)
    }
  }

  function onError(cause) {
    connectReq.removeAllListeners()

    debug('tunneling socket could not be established, cause=%s\n', cause.message, cause.stack)
    var error = new Error('tunneling socket could not be established, ' + 'cause=' + cause.message)
    error.code = 'ECONNRESET'
    options.request.emit('error', error)
    self.removeSocket(placeholder)
  }

request.js

self.req.on('error', self.onRequestError.bind(self))


Request.prototype.onRequestError = function (error) {
  var self = this
  if (self._aborted) {
    return
  }
  if (self.req && self.req._reusedSocket && error.code === 'ECONNRESET'
      && self.agent.addRequestNoreuse) {
    self.agent = { addRequest: self.agent.addRequestNoreuse.bind(self.agent) }
    self.start()
    self.req.end()
    return
  }
  if (self.timeout && self.timeoutTimer) {
    clearTimeout(self.timeoutTimer)
    self.timeoutTimer = null
  }
  self.emit('error', error)
}

The only questions we should care about are:
1. For socket hang up error, what will the underlying socket do? if the socket won't close because there are remaining data to send, how to force close it?
2. For that statusCode is not 200, what will the underlying socket do? close as usual?
The two errors both called ECONNRESET in request module, but first is a really socket error, second is kind of http error. So we can solve this problem when we answer the two questions.

Thanks a lot. It's very nice of you to help.

@mscdex
Copy link
Contributor

mscdex commented Nov 1, 2016

ECONNRESET is never an http thing, it means the other end abruptly/forcefully closed the TCP connection for whatever reason. I do not know why artificial ECONNRESET errors are being created though, that seems wrong to me.

Since I'm not familiar with the proxying part of the code base, I don't know what to suggest except use an http/https.Agent that has a cap on the maximum number of sockets (the default global Agent instances have no limits in modern versions of node) and/or use keep-alive in said Agent as much as possible?

@mike442144
Copy link
Author

@mscdex according to the code in tunnel-agent/index.js:

var connectReq = self.request(connectOptions) // this is same as http.request
connectReq.once('connect', onConnect)   // for v0.7 or later

onConnect is a callback for connect event of req, and if statusCode is not equal to 200, emit an error on req, in this situation a socket communication is finish, and the client received a complete http packet, so it's a http issue. For instance, if we request a website, but got a response with statusCode 500, totally the same thing. What is different is in tunnel-agent/index.js it manually emit an error with code ECONNRESET on req, but request.js cannot figure out which one is a socket issue and which is not.

Thanks very much.

@mike442144
Copy link
Author

Anybody can help?

@harshitgupta
Copy link

harshitgupta commented Nov 4, 2016

I had a similar issue. I have this X server(Tomcat/Java) that had some legacy APIs deployed on it. I needed to bypass all those API through my node server. For that purpose, I am using request-promise-native which is based upon 'request'.

Out of nowhere, X server started taking too long for response. I thought its the npm package thats messing it up. But later I realized the X server was taking to some other(db) and not closing the connection(sockets) properly. Because of which it had hundreds of ports active. I restarted that server and it started working fine.

I was able to figure it out by running 'netstat' on the server and checking foreign address.

TL;DR It was the X server that was messing up things, not this package itself.

Hope it helps.

@mike442144
Copy link
Author

mike442144 commented Nov 6, 2016

@harshitgupta Thanks a lot, I'm doing this to confirm request handle the socket close event correctly, because I use the request module as a client to connect to a proxy server, and received a lot of ECONNRESET message, I have to know whether this module close the socket or not. If not, why and how?
See diagram:

Server is ready to close the socket
Server ---> FIN ---> Client
Server <--- ACK <--- Client
Server is in FIN_WAIT_2, and client is in CLOSE_WAIT
Server <--- FIN <--- Client
After the client sent FIN to Server,the client will be LAST_ACK
Server ---> ACK ---> Client

As the above,our program do not send FIN packet to server.

@mike442144
Copy link
Author

I have captured the packets between proxy server and my process and rename the src IP as local,

local -> sdr [SYN]
sdr -> local [SYN,ACK]
local -> sdr [ACK]
sdr -> local [ACK]
local -> sdr [ACK]
local -> sdr [ACK]
below message is the server sent FIN packet to close the socket
sdr -> local [FIN,ACK]
local -> sdr [ACK]

Obviously, my process didn't send FIN to notify the server. Anybody who know how to close the underlying socket manually ?

@mike442144
Copy link
Author

mike442144 commented Nov 8, 2016

Updated: below solution doesn't work!!!! It seems I cannot get 'end' event on underlying socket if connect is established and statusCode is not 200, but received 'end' event when onError. Why? will it be a bug of net? seems have something to do with socket!!

I can fix the problem now, but I would like to confirm some code in tunnel-agent, Who is the author of tunnel-agent? seems nobody is familiar with this repo? @simov @mscdex @mikeal

I have tested the tunnel-agent for a few days, and finally I found number of CLOSE_WAIT was equal to number of error which statusCode was not equal to 200 in onConnect function, here is the code and solution:

///// in createSocket function:

var connectReq = self.request(connectOptions)
  connectReq.useChunkedEncodingByDefault = false // for v0.6
  connectReq.once('response', onResponse) // for v0.6
  connectReq.once('upgrade', onUpgrade)   // for v0.6
  connectReq.once('connect', onConnect)   // for v0.7 or later
  connectReq.once('error', onError)
  connectReq.once('socket', function(socket){
    /* The solution, just send FIN packet to server if received FIN packet 
    * despite the pending data queue.
    */
        socket.on('end',function(){
            socket.end();
        });
    });
  connectReq.end()

function onConnect(res, socket, head) {
    connectReq.removeAllListeners()
    socket.removeAllListeners()

   if (res.statusCode === 200) {
      assert.equal(head.length, 0)
      debug('tunneling connection has established')
      self.sockets[self.sockets.indexOf(placeholder)] = socket
      cb(socket)
    } else {

    /*in this condition path, the socket will enter into CLOSE_WAIT status,
    * because the server sent FIN packet to us,but we have pending data in queue
    * so that we do not send FIN packet. 
    */ 
      debug('tunneling socket could not be established, statusCode=%d', res.statusCode)
      var error = new Error('tunneling socket could not be established, ' + 'statusCode=' + res.statusCode)
      error.code = 'ECONNRESET'
      options.request.emit('error', error)
      self.removeSocket(placeholder)
    }
  }

Socket document about end: net_event_end

But, I'm not sure why should the socket remove all listeners? socket.removeAllListeners(), is there any pitfalls?

@mike442144
Copy link
Author

mike442144 commented Nov 9, 2016

@simov @mscdex @mikeal
I have found the problem, please see the two lines code in onConnect function of tunnel-agent:

 function onConnect(res, socket, head) {
    connectReq.removeAllListeners()
    socket.removeAllListeners()
  }

Why ???? Why does it remove all listeners? Obviously I cannot receive end event after that! At this moment, a FIN packet is coming, so no body will be responsible for the socket close issue!

@mike442144
Copy link
Author

@mscdex @simov Please merge the pull request, I have fixed the this bug.

@simov
Copy link
Member

simov commented Dec 21, 2016

I'm not that familiar with tunnel-agent. The original author of that code is maintaining it here: https://github.com/koichik/node-tunnel

I diffed the two code bases and I spotted only minor visual differences, and in fact our test suite here in request is passing with either module. So I'm for switching to the original tunnel module instead and if you still experience the same issue open a PR there.

You can modify lib/tunnel.js and test with it:

-, tunnel = require('tunnel-agent')
+, tunnel = require('tunnel')

Good job on this issue 👍

@mike442144
Copy link
Author

mike442144 commented Dec 22, 2016

@simov I've read the source code of tunnel, it still has this problem, I'm going to open a PR. and I suggest you to switch to original tunnel after the PR merged. I'll tell you.

@mike442144
Copy link
Author

mike442144 commented Dec 23, 2016

@simov The original author @koichik seems do not maintain the repo anymore. I suggest you to keep the tunnel-agent dependency in request.js.

@kazaff
Copy link

kazaff commented Dec 29, 2016

@mike442144 hi, i have the same environment with you. can you show the whole code? thank you.

i mean i wanna see how to use the error callback when upgrade to your code version

@mike442144
Copy link
Author

@kazaff please refer to https://github.com/bda-research/tunnel-agent/tree/callback, it's easy to fix, but deep enough. Before merged to master and republished, you can use my repo as a dependency.

@kazaff
Copy link

kazaff commented Dec 30, 2016

@mike442144 thanks a lot~

and, i wonder that if need change my business code to adapt your repo.

i read the source code by self. and base on your solution, i confuse at Request error event. now i add some code , its look like:

var socketHold = null;    // add new code
Request
.get({/*some config*/}, function(err, response){/*my original code*/})
.on('error', function(error){
  // add new code
  console.error('Request on error');
  console.error(error.stack);
  if(socketHold){
	socketHold.emit('free');
  }
}).on('socket', function(socket){
   socketHold = socket;
});

do i need to do like that? or just do nothing only change the dependency to your repo?

need your help, thank very much.

@mike442144
Copy link
Author

@kazaff What you should do is using npm-shrinkwrap.json to replace tunnel-agent version. Nothing need to do with your code.

{
    "name": "your project name",
    "version": "your project version",
    "dependencies": {
	"tunnel-agent": {
	    "version": "0.4.3",
	    "from": "tunnel-agent@>=0.4.1 <0.5.0",
	    "resolved": "git://github.com/bda-research/tunnel-agent.git#callback"
	}
    }
}

That's it.

@kazaff
Copy link

kazaff commented Dec 30, 2016

@mike442144 ok, cool. i'll be try it.

@mike442144
Copy link
Author

mike442144 commented Mar 17, 2017

I notice @mikeal have new version published of tunnel-agent, which break my project. Because version only between 0.4.1 to 0.5.0 will use my own repo.
So, @mikeal Are you going to merge my pull request, which have tested for about 4 months. Or any suggestions?
For a temp solution, let's modify the from field to >=0.4.1 only.

@mikeal
Copy link
Member

mikeal commented Mar 17, 2017

link to PR?

@mike442144
Copy link
Author

@mike442144
Copy link
Author

@mikeal How are you going to deal with the issue? Just ask, since I haven't seen any update :)

@ghost
Copy link

ghost commented Jun 30, 2017

+1 I'm running on Ubuntu and this issue can easily cause node to balloon to thousands of sockets. Also, there was a PR that applied options.timeout to the CONNECT request, that would also be useful.

@mattdonnelly
Copy link

Is anyone looking at this? I'm also effected by it

@mike442144
Copy link
Author

mike442144 commented Aug 15, 2017

@mattdonnelly Use my repo as your dependency,git://github.com/bda-research/tunnel-agent.git#timeout and add npm-shrinkwrap.json as I mentioned above.

@stale
Copy link

stale bot commented Nov 23, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 23, 2018
@stale stale bot closed this as completed Dec 4, 2018
@mike442144
Copy link
Author

Hey, @mikeal , Do you have chance to merge my request to tunnel-agent? It has big impact to long live program with proxy. Currently we have to write npm-shrinkwrap to revert back the version. You've see so many people have the same problem and my code fix it well, mentioned in this issue. I want to know how can you merge it, and if you're not in charge of this repo so who else should I contact ? Or is there any other steps I missed. Also pls pay attention to this comment @simov @mscdex , Hope to make it recently, thank you so much. :)

@mayinghan
Copy link

I had the same issue, and I updated my the tunnel-agent/index.js with @mike442144 's code from his PR and it fixed the issue. It has been almost 4 years since this bug was exposed and had a potential fix, I just wonder is there any official update from @mikeal on this issue?

@mike442144
Copy link
Author

Please reopen this issue, even npm-shrinkwrap is included in the project, didnt't work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants