Build Process Potentially Leaking Memory #118

stevekinney · 2014-07-08T14:24:12Z

@jugglinmike I noticed the site was down today, but the server was up. So I ran the deploy.sh script and got this error that I think we talked about in #65.

I'm going to try to power cycle the server in the mean time.

Tracing dependencies for: socket.io-client
Compressed CSS output to 78%.
Compressed CSS output to 78%.
Compressed CSS output to 78%.
FATAL ERROR: Evacuation Allocation failed - process out of memory
./deploy.sh: line 19:  2475 Aborted                 (core dumped) grunt build

The text was updated successfully, but these errors were encountered:

stevekinney · 2014-07-08T14:28:39Z

Update: Power cycling worked, which leads me to believe that we do have a memory leak on our hands.

jugglinmike · 2014-07-08T20:04:24Z

Hi Steve,

A little more information about the system in its failing state will help determine where to look. The next time you experience this failure, could you run the following commands and share the contents of the created files?

$ COLUMNS=512 top -bcn 1 > top.txt
$ free -t -m > free.txt
$ ps -aeF > ps.txt

Also, knowing the time since last deployment would help @mzgoddard and I estimate severity. Do you know how long it had been since you last deployed prior to the incident you reported here?

stevekinney · 2014-09-18T15:05:26Z

@jugglinmike So, it looks like we're coming across a daily memory leak issue. I rebooted the server yesterday and it was out of memory again today.

Here is the console message:

The last deployment was the last merge into master. But it's run out of memory since the last time I rebooted the server, which was yesterday.

Thoughts?

/cc @escoleman3 @kgotchet @jlefeber @mzgoddard

jugglinmike · 2014-09-18T15:17:55Z

@stevekinney The next time this happens (tomorrow morning, by the sound of it) and before rebooting the server, could you grab the stats I mentioned in my previous comment?

stevekinney · 2014-09-18T15:20:12Z

Yup, I couldn't log in because the key on the server was from my CEE iMac, which I don't have anymore. So, I need @escoleman3 to pop in my personal key. I rebooted because someone needed to use it in the next two hours.

stevekinney · 2014-09-22T17:11:41Z

So, @jugglinmike—the server went down twice today. I believe @escoleman3 reset it once this morning. I'm including the information you requested.

https://gist.github.com/stevekinney/be2a2de91aa864306577

jugglinmike · 2014-09-23T20:41:23Z

Thanks @stevekinney . @mzgoddard and I have run through the data, and we think we understand the problem. This is our theory:

It looks like the "top" server is failing occasionally and leaving its child processes (the activity servers) orphaned. The forever module is correctly restarting the top-level server, and it is spawning new activity servers. This repeats over time, until the environment is filled with zombie servers.

This highlights two separate problems:

The top-level server is failing on a regular basis
The children are left running

#1 is likely caused by a memory leak, and resolving it may require additional forensics. #2 can be resolved if by maintaining a list of child process IDs on disk and killing those processes on startup.

#1 is definitely the trickier problem, but (if we've interpreted all this correctly), resolving #2 will result in improved application behavior: the app will continue to fail intermittently, but it will immediately restart itself cleanly. The site will suffer little downtime (and it will be resolved automatically), but it will kick active users and lose saved activity results.

I'm going to begin work on a fix for #2 tomorrow, as it seems to be the low-hanging fruit here.

Does this make sense to you?

jugglinmike mentioned this issue Sep 24, 2014

Cleanup procs #122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build Process Potentially Leaking Memory #118

Build Process Potentially Leaking Memory #118

stevekinney commented Jul 8, 2014

stevekinney commented Jul 8, 2014

jugglinmike commented Jul 8, 2014

stevekinney commented Sep 18, 2014

jugglinmike commented Sep 18, 2014

stevekinney commented Sep 18, 2014

stevekinney commented Sep 22, 2014

jugglinmike commented Sep 23, 2014

Build Process Potentially Leaking Memory #118

Build Process Potentially Leaking Memory #118

Comments

stevekinney commented Jul 8, 2014

stevekinney commented Jul 8, 2014

jugglinmike commented Jul 8, 2014

stevekinney commented Sep 18, 2014

jugglinmike commented Sep 18, 2014

stevekinney commented Sep 18, 2014

stevekinney commented Sep 22, 2014

jugglinmike commented Sep 23, 2014