PROBLEM: gc Concurrent Mark Sweep in fire fighting mode
Ever since I deployed Zabbix in our local network, I've been sporadically receiving notifications about the Java garbage collector being in fire fighting mode. Unfortunately, Google isn't all too helpful about what that means. Here's how I understand it currently.
From what I understand — keep in mind, I am by no means a Java expert — Java has quite a few garbage collection algorithms to choose from, each of which comes in a "lightweight" mode that, for the most part, runs in parallel with the application; and a "heavyweight" mode that has to completely stop the application, clean up, and then resume execution. The latter is a bit more thorough, so it is chosen if the former could not free enough memory for new allocations.
What Zabbix is trying to say is that Java ran the more expensive GC algorithm more often than the lesser expensive one, which usually seems to indicate that you should assign more memory to the JVM it is talking about.
On the other hand, the notification states that this is because the cheaper algorithm hasn't run at all, and the expensive algorithm ran at a rate of 0.016721 collections per second — that is, it runs once every 60 seconds:
1. gc ParNew number of collections per second (hostname:jmx["java.lang:type=GarbageCollector,name=ParNew",CollectionCount]): 0 2. gc ConcurrentMarkSweep number of collections per second (hostname:jmx["java.lang:type=GarbageCollector,name=ConcurrentMarkSweep",CollectionCount]): 0.016721
So Zabbix seems to be a little triggerhappy here, at least for our environment.
To resolve this issue, I modified the trigger to only fire if the cheaper algorithm has run at least once in ten seconds, by changing the trigger expression like this:
({Template JMX Generic:jmx["java.lang:type=GarbageCollector,name=ParNew",CollectionCount].last(0)}<{Template JMX Generic:jmx["java.lang:type=GarbageCollector,name=ConcurrentMarkSweep",CollectionCount].last(0)}) and ({Template JMX Generic:jmx["java.lang:type=GarbageCollector,name=ParNew",CollectionCount].last(0)}>=0.1)
Let's see how it goes...