Yesterday I did an major upgrade to an rather busy website (should be in the top 10 most used in my country). Basically I moved the code from the old 32 bit single-core webserver running FreeBSD 4.8, Apache 1.3 and PHP 4.X to a new 64 bit, 4 core server running FreeBSD 7.0, Apache 2.2, PHP 5.2. As you might imagine changing so many variables at once caused several surprises. Most interesting one was that Apache died on some pages with signal 4 (SIGILL) which stands for illegal instruction. It’s not that uncommon to see apache deaths with signals 11 (SIGSEGV) and 6 (SIGBUS) which are usually caused by bugs in some PHP modules but signal 4 was something new to me. Quick Google search turned up some threads were people reported getting rid of it by changing module orders in PHPs extensions.ini file and a bug report about PHP function preg_replace() sometimes causing it.
I tried both theories and neither one was the case that I had. Then I spent some time looking at the coredump with GDB but in the end I resorted to good old binary search strategy. So to make a long story short the cause turned out to be a recursive function in the website code that under some circumstances never stopped recursing until it overflowed it’s stack.