Wednesday, August 31, 2005

More on File Handles

In response to my "Too many open files in system" problem, I got the following email (posted here with permission, provided it's anonymous):

On our 2.6.9 appliance box we have been suffering the same problem running our application software. I tracked the issue down to the use of ps(1) in a system where /proc is changing quite often. It appears that repeated use of ps can cause the /proc/sys/fs/file-nr count to go up quite rapidly. In our case it was hitting its 200 k limit in just a few days!

The sort of command we were executing often was: count=`ps aux | grep "service" | grep -v grep | grep -v defunct | wc -l`

Things seem to be improved by doing the ps aux to a file, rather than a pipe. the problem seems worse the more output ps produces.

Note that the kernel doesnt increment file-nr too often - there is latency for performance reasons - so it tends to jump up in multiples of 25. When experimenting you have to be patient and wait a while to see if running a command repeatedly is causing a leak.

The problem is that ps itself doesnt appear to be leaking fds directly - it looks like its a "virtual" leak in the kernel, perhaps caused by ps's interaction with /proc in a system that has processes being created and killed very often.

We have worked round this issue by reducing our use of ps to a minimum. Now our application software only leaks a few hundred handles a day, rather than thousands a hour!

I have spent many hours on the net and havent found any kernel or ps patches for this. We will be upgrading to a later kernel at some point - hopefully that will fix the issue.

0 Comments:

Post a Comment

<< Home