Chasing down Guava cache slowness

Note: If you are using Guava < 13 and looking to fix a serious cache performance issue, you can directly jump to the conclusion. This article describes how I re-discovered this three year old issue but you probably don’t care.

I have spent the last few months bootstrapping a couple of big data pipelines. I will not go into details, but basically the goal is to crunch almost all the HTTP hits issued from French mobile phones to publish the official French web audience. After having focused on business code for a while, I have eventually been able to start chasing down performance issues.

The first section of theses pipelines is made of several enrichment steps adding various meta-data to the raw HTTP hits. Some of these steps are fast (nanoseconds) some others are quite expensive (milliseconds). When a slow enrichment step is a pure function, an in-process cache is put in front of it to amortize its cost (the distribution of the input data is almost always exponential).

During the initial phase, we decided to use Guava caches without any decent tuning. They are usually good enough and it is smarter to wait to have the full pipeline in place before fiddling with knobs or writing your own cache.

Last week, I started playing with the cache configuration to get a glimpse of the performance model of one of the pipelines. Our caches were configured to be quite small, tens of megabytes, and my goal was to observe what happens when they are made larger (cache statistics, execution time etc.).

I observed that:

  • Doubling the cache size significantly improved the cache hit ratio but execution time did not improved
  • Making the cache ten time vastly improved cache hit ratio but slowed down the job at lot

I did not expect that since caches are designed to have O(1) insertion and lookup and our caches are still small enough to not have major impact on GC.

Troubleshooting

GC logs quickly proved that the GC was not the issue. The next logical step was to check the performance of the caches. Java Flight Recorded shown that a significant amount of time was spent in the caches, so I decided to quickly hack a benchmark using JMH.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public abstract class AbstractBenchmark {

  @Param({"10000", "100000", "1000000"})
  private int maxSize;

  MyLoadingCache cache;
  ExponentialDistribution exponentialDistribution;

  abstract MyLoadingCache getCache();

  @Setup
  public void setup() {
    this.cache = getCache();
    this.exponentialDistribution = new ExponentialDistribution(maxSize/2);

    // Pre-fill the cache
    for (int i = 0; i < maxSize; i++) {
      run();
    }
  }

  @Benchmark
  public String run() {
    String key = Long.toString((long)exponentialDistribution.sample());
    return cache.get(key);
  }

  @TearDown
  public void tearDown() {
    System.out.println(cache.getStats());
  }

  protected String load(String key) {
    return key + System.currentTimeMillis();
  }

  protected int getMaxSize() {
    return maxSize;
  }

  public static class GuavaCacheBench extends AbstractBenchmark {

    @Override
    MyLoadingCache getCache() {
      return new GuavaCache(getMaxSize(), this::load);
    }
  }

  public static class LinkedHashMapCacheBench extends AbstractBenchmark {

    @Override
    MyLoadingCache getCache() {
      return new LinkedHashMapCache(getMaxSize(), this::load);
    }
  }
}
interface MyLoadingCache {
    String get(String key);
    String getStats();
}

Basically, we time the get operation of a loading cache using an exponential distribution. The MyLoadingCache interface and the AbstractBenchmark indirection are here to easily evolve the benchmarks allow to compare several cache implementations if needed.

The first MyLoadingCache implementation is the idiomatic use of a Guava cache. We use it for some of our caches:

class GuavaCache implements MyLoadingCache {

    private final LoadingCache<String, String> cache;

    public GuavaCache(int maxSize, final Function<String, String> loader) {
        this.cache = CacheBuilder.newBuilder()
                .maximumSize(maxSize)
                .concurrencyLevel(1)
                .recordStats()
                .build(new CacheLoader<String, String>() {
                    @Override
                    public String load(String key) throws Exception {
                        return loader.apply(key);
                    }
                });
    }

    @Override
    public String get(String key) {
        return cache.getUnchecked(key);
    }

    @Override
    public String getStats() {
        return String.format("size: %s stats: %s", cache.size(), cache.stats());
    }
}
Benchmark                              (maxSize)  Mode  Cnt  Score   Error  Units
AbstractBenchmark.GuavaCacheBench.run      10000  avgt   10  0.384 ± 0.027  us/op
AbstractBenchmark.GuavaCacheBench.run     100000  avgt   10  0.603 ± 0.042  us/op
AbstractBenchmark.GuavaCacheBench.run    1000000  avgt   10  0.929 ± 0.618  us/op

We can observe that things slow down a bit when max size increase. But combining these results with the cache statistics of the pipeline does not make sense:

  • Cache hits have much more than doubled when the cache size doubles
  • Jobs are slower to waaaay slower
  • Benchmarks tell us that the maximum slow down to expect is ~ 2

I was in stalemate, not being able to understand what was going on and short on ideas. But I eventually remembered that our pipelines were still using Guava 11 because of Hadoop (Guava pushed in the classpath by Hadoop MR, so you better align your version with it). Let’s retry with Guava 11:

Benchmark                              (maxSize)  Mode  Cnt    Score    Error  Units
AbstractBenchmark.GuavaCacheBench.run      10000  avgt   10    6.468 ±  0.499  us/op
AbstractBenchmark.GuavaCacheBench.run     100000  avgt   10  160.965 ± 93.156  us/op
AbstractBenchmark.GuavaCacheBench.run    1000000  avgt   10  855.142 ± 20.248  us/op

Wow. This is just awful and could explain our issue. Since the bug has already been solved, bisecting on Guava revisions allows to quickly discover that the fix appeared in Guava 13 and crawling the release notes brings up issue 1055.

Basically, cache are array-backed and the underlying array is never resized when the cache grows leading to O(n) put and get. Using a linked list as a cache is pretty inefficient!

The fix

I am pleased that Guava 13 fixed the issue... but I am still stuck to Guava 11. I need a workaround.

Thanks to issue 1055, it is easy to understand the problem: expand never gets called. But if we set the underlying array to the maximum size right from the beginning we don't need to expand it. Fortunately, like with many array-based collections it is possible to specify an initial capacity. Let’s try that:

    public GuavaCache(int maxSize, final Function<String, String> loader) {
        this.cache = CacheBuilder.newBuilder()
                .initialCapacity(maxSize) // <- HERE
                .maximumSize(maxSize)
                .concurrencyLevel(1)
Benchmark                              (maxSize)  Mode  Cnt  Score   Error  Units
AbstractBenchmark.GuavaCacheBench.run      10000  avgt   10  0.377 ± 0.027  us/op
AbstractBenchmark.GuavaCacheBench.run     100000  avgt   10  0.654 ± 0.060  us/op
AbstractBenchmark.GuavaCacheBench.run    1000000  avgt   10  1.029 ± 0.675  us/op

Problem solved!

Conclusion

  • Even high quality and widely used libraries like Guava have issues (even dumb but disastrous issues like this one).
  • A sound troubleshooting process allows to quickly narrowing down issues
  • If you are using Guava caches and are stuck to a version older than 13 you MUST set the initialCapacity to the same value than maximumSize
Tagged , , ,