McVeigh's Blog buzz     About     Archive     Feed

CPU hungry processes - 100%?


Simple troubleshooting technique

I used this recently when trying to determine why a java process was consuming 100% cpu intermittently. Its a simple technique that may not be common knowledge. Handy when you don’t have any fancy tools available, and are, say, working on a linux server with the terminal only at your disposal.

  • Running ‘top -H’ when the cpu is high, gives you an ordered view of the linux ‘threads’ (rather than the normal process view) that are consuming cpu. The pid in this case refers to the id of the thread.
  • Executing a thread dump against your java process at the same time (kill –QUIT <java process id> - this is your traditional pid, NOT the thread id from 1st step) will write the status of the executing threads to console out (thread dump).
  • The thread dump has the following format (an example thread):
http-8443-2" daemon prio=10 tid=0x00007fa740013800 nid=0x15ba runnable [0x00007fa723cfa000] 
 java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) 
at java.net.SocketInputStream.read(SocketInputStream.java:152) 
at java.net.SocketInputStream.read(SocketInputStream.java:122) 
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) 
at java.io.BufferedInputStream.read(BufferedInputStream.java:254) 
 - locked <0x00000000c537b948> (a java.io.BufferedInputStream)
  • The nid highlighted is the same as the thread id from the top command in step one, except that it is in hexadecimal in the thread dump
  • So converting your ids retrieved in step one to hexadecimal allows you to zero in on the actual underlying thread that is causing pain
  • example convert command
echo 'ibase=10;obase=16;insert-thread-id-here' | bc
  • Running the thread dump multiple times while the process is performing badly is useful to give an idea of what the thread tries to do over time, e.g. 20 times per minute. This is were the stack trace of the bad thread can give you an insight into what is actually going on.

At the very least it gives a view of what exactly the process is trying to do when it is struggling, the stack trace of the individual threads show this.

In my particalur problem with a greedy java process, I was able to see from the thread dump stack trace that the two threads consuming the most cpu, where related to RMI calls between a client and server process, so it gave me something to go on (and I evenutally found the issue, bad config from me of course!), rather than just trying different analysis tools.

Thanks!