Moove It is now Qubika. For more information, visit our new website
Visit Qubika

The Problem

On a recent project running on jRuby 1.7.23, mizuno 0.6.8 and the Rest Service Framework Angus 0.0.13, we were faced with a substantial memory leak. During our initial troubleshooting, we found that restarting mizuno weekly prevented the process from crashing. This meant that while we were debugging, we were also scheduling restarts every Monday night in order to prevent any downtime for the 500+ requests per minute.

While this worked, it was a short term solution, because: 1. it meant a team member would have to be present every Monday evening for the restart (and what’s the fun in that?!) and 2. changes in system demand would affect the schedule, and we wouldn’t know until it was too late. For example, if demand suddenly spiked, it may cause the system to crash; fixed only by more frequent restarts. We needed to find our long term solution.

Troubleshooting jRuby

To begin troubleshooting, we started by dumping the heap right before doing a restart by using the command:

jmap -dump,format=b,file=heap_before_restart.bin [pid]

We waited a full day and dumped the heap again, in order to get a more realistic heap benchmark.

Then we ran both dumps through VisualVM (which BTW is an excellent tool), generating the following results:

VisualVM results: Before mizuno restart on jRuby 1.7.23

Figure 1: VisualVM results: Before mizuno restart on jRuby 1.7.23

VisualVM results: After mizuno restart

Figure 2: VisualVM results: After mizuno restart

We noticed an odd return: org.jruby.RubySymbol had dramatically increased in size from one memory dump to the other. A closer inspection of this class revealed the following:

VisualVM jRuby Symbols

Figure 3: Wow, that’s a lot of jRuby symbols!

We then took a look at some of the latter ones:

 First symbol of the last set

Figure 4: First symbol of the last set

Last symbol of the last set

Figure 5: Last symbol of the last set

They all seemed to follow the pattern: current-[number]. It was very unusual to see a large number of symbols being defined and not garbage collected. We researched garbage collection in Ruby, which revealed that the garbage collection process does not mark symbols for deletion, at least until Ruby 2.2. The jRuby 1.7.23 that we were using is based on Ruby 1.9.3.

It looked like we had found the culprit! Somewhere in the code, symbols were being defined over and over again, using dynamic values. Banking on that idea, we performed a search in the codebase, yielding the following line, from the gem connection_pool:

@key = :"current-#{@available.object_id}"

This pattern matched exactly the symbols that were showing up in the memory dump. We were getting closer to the solution.

Investigating the codebase further resulted in the following:

module Services
 class RedisStore

   .
   .
   .

   def redis
     @redis ||= ConnectionPool.new(@@pool_settings) { Redis.new(@@redis_settings) }
   end
 end
end

The class RedisStore defined a new ConnectionPool upon initialization. And for every new request for a specific resource, this class was initialized. This led to a new symbol, with the dynamic name. Now, we had a strong hypothesis, and it was time to test a countermeasure.

The Solution

To test our theory, we set a trap by changing the code so that the redis variable became a class variable (instead of an instance one). After that, it was time to upload the change into our testing environment and start hammering requests for the specific resource. Memory dumps in between the symbol table started growing larger and larger. Bingo! We had found the solution.

A set of smoke tests were run in order to ensure that everything was still working correctly. After that, it was a matter of scheduling and executing a deploy, then monitoring the memory usage periodically.

During monitoring, we generated dumps every day for about two weeks. Below are some examples of what we found. While monitoring, we found that the symbol table did grow, we also found that the increase was normal and the scores of current-[number] symbols were nowhere to be found. This meant that the problem was solved – and the team was happy!

First jRuby HeapDump post fix

Figure 6: First jRuby HeapDump post fix

Second jRuby HeapDump post fix

Figure 7: Second jRuby HeapDump post fix

Last jRuby HeapDump post fix

Figure 8: Last jRuby HeapDump post fix

Get our stories delivered from us to your inbox weekly.