Wednesday, June 08, 2005

More on file-nr, file-max

Not to be satisfied with not understanding my "Too many open files in system" problem, I started digging into the Linux 2.6.8 kernel source code.

kernel/sysctl.c defines the /proc/sys/fs files. There's fs_table, an array of ctl_table structs, that defines file-nr and file-max, and establishes that they're pointers to the files_stat structure, i.e. files_stat holds the data that's displayed when you 'cat file-nr'. file-nr is R/O, file-max is R/W.

fs/file_table.c (and include/linux/fs.h) defines files_stat. As you'd expect, it's a struct that holds nr_files, nr_free_files and max_files members. max_files is what gets changed when you alter /proc/sys/fs/file-max.

As described in Documentation/filesystems/proc.txt, nr_free_files is always zero as of the 2.6 kernel. max_files starts out at 10% of memory, but as mentioned, can be changed via /proc/sys/fs/file-max. Examples exist of altering startup scripts to do this at boot time, overriding the default where needed.

fs/file_table.c has get_empty_filp() that does the magic check to see if nr_files > max_files, and if so (and you're not root), deny you the new file. Failing this way puts an info message "VFS: file-max limit reached" into the logs.

fs/file_table.c is the only place that alters nr_files. It gets incremented in filp_ctor() and decremented in filp_dtor. filp_ctor/filp_dtor, in turn, get set as function pointers in fs/dcache.c when the "filp" kmem cache (named filp_cachep) gets created. filp_cachep, in turn, is also only referenced in fs/file_table.c, by get_empty_filp() and file_free().

get_empty_filp() and file_free() call kmem_cache_alloc() and kmem_cache_free(), respectively. These functions live in mm/slab.c.

file_free() gets called largely only by put_filp(), also in fs/file_table.c. get_empty_filp() and put_filp() get called by a variety of places, including fs/open.c, fs/pipe.c, mm/shmem.c and net/socket.c.

Thursday, June 02, 2005

"Too many open files in system"

We got this message on one of our servers recently. I went to our sysadmin guy, who "fixed" it by raising the limit in /proc/sys/fs/file-max. So far, so good. Then we started looking at why we were running out, as that server's been fine for quite a while.

/proc/sys/fs/file-nr reports the number of file descriptors allocated by the system. We had over 200,000 descriptors allocated. /usr/bin/lsof lists open files per process, but showed less than 4000 in use.

I also found an article that explains this file in better detail. It mentioned that there'll usually be a difference between file-nr's and lsof's counts, but it implies that lsof's should be higher, since lsof shows net connections, pipes, maps, etc.

I started shutting down processes, in hopes that one was holding leaked descriptors, but nothing helped. We finally rebooted, and file-nr's allocated count came back nice and small, less than lsof's count, as expected.

If anyone has an explanation for this, I'd love to hear it.

Wednesday, June 01, 2005

"Could not open relation" in PostgreSQL

A coworker was getting "ERROR: could not open relation with OID nnnnnnn" in PostgreSQL. I poked around, and indeed, the query he was issuing was trying to use an index that I had dropped the day before. Shutting down psql and restarting it fixed it.

It seemed odd that psql would apparently cache a query plan, rather than letting the server handle it. Unless the problem was in the connection, and not psql, and recycling psql just recycled the connection, which was probably the case, now that I think about it.