some fixes for the nginx push module

I have been testing various Comet/Push servers lately and finally decided to use Nginx Push module. My use case is a bit extraordinary as far as Comet applications go – I need to have ~150k TCP connections open 24/7, but there’s no need for broadcasting or message queuing functionality.

I found 2 memory leaks in the Nginx Push module ver. 0.692.

The first one occurs when you send a message to a channel that doesn’t have any listeners. Here’s a patch I wrote for that. I have only tested this with the push_subscriber_concurrency first and push_store_messages off scenario, it might very well break other scenarios.

The other far more annoying memory leak occured for every message that I sent and was rather large (message length + ~200 hundred bytes). So it was leaking about 27 MB for 100k “hello world!” type test messages.

I spent several days hunting this one down and finally it occured to me that the nginx pool allocator that the module used through ngx_(p|c)alloc() and ngx_pfree() functions wasn’t really built to free memory in the general case. Unless you allocations were larger than some defined threshhold (4k) they were done from a memory block that didn’t have any means in the data structure for actually freeing these allocations (it only keeps the max alloced pointer).
So if the small allocation area ran out of memory it was just increased.

Larger allocations were kept in a different allocation block list and were actually free’d on ngx_pfree().

Here’s the actual source of the ngx_pfree()

ngx_int_t
ngx_pfree(ngx_pool_t *pool, void *p)
{
    ngx_pool_large_t  *l;
 
    for (l = pool->large; l; l = l->next) {
        if (p == l->alloc) {
            ngx_log_debug1(NGX_LOG_DEBUG_ALLOC, pool->log, 0,
                           "free: %p", l->alloc);
            ngx_free(l->alloc);
            l->alloc = NULL;
 
            return NGX_OK;
        }
    }
 
    return NGX_DECLINED;
}

This allocator is designed that way to be as efficient as possible for allocating memory for requests that generally have a short life time. For that particular use case not freeing small amounts of memory is mostly OK and because of the simpler data structure also more effective. All the memory blocks were free’d anyway then the request was done and ngx_destroy_pool() was called.

The Push module used this pool allocator in a different way – the pool was created on bootup and it was never destroyed so it just grew and grew even though free was called correctly.

So anyway, here’s a second patch that fixes this memory leak by replacing nginx pool allocator usage with actual system allocator. I have tested this with 200k TCP connections with 100M messages and the memory usage hasn’t changed at all.

I also found a segmentation fault when using push_subscriber_concurrency last. This is probably some kind of concurrency/locking issue since this setting causes internal broadcasting under some conditions. I haven’t spent any time hunting that bug since I really needed push_subscriber_concurrency first. Besides the author of the module said that bug was known and someone was already working on that.

5 thoughts on “some fixes for the nginx push module

  1. N

    Very cool. Why don’t you fork the original repository, apply your changes, and send a pull-request so that Leo may accept this and others will find your hack?

  2. hadara Post author

    Actually I did it a long time ago @ https://github.com/hadara/nginx_http_push_module
    I have sent several emails about it with detailed testing description & reasoning to Leo and someone sent him a push request that contained this patch among other things. He specifically rejected these memory leak fixes since he didn’t believe that the memory allocator could behave the way I described.
    My detailed description of steps required to reproduce the bug are available @ https://github.com/slact/nginx_http_push_module/pull/60

    Anyway this code has been in production use with constant 120 000+ client connections and ~100 messages per second for several months now and works rather well. There are still some small memory leaks that I haven’t traced down yet so I have to restart it about once a week.

  3. N

    Ah. I haven’t realized the pull request contained your fixes… and Leo has rejected it despite the existing problem.
    It’s sad to see that he wouldn’t accept it until it would actually happen to him, but it’s his choice…

    I don’t think all of those pull requests should be taken into account in one way or another, but they indicate there are issues and needs to improve.
    Anyway, I’ll use your fixes until somebody would fully take over the module and manage everyone’s needs. Thanks!

  4. Ariel

    Many thanks for this. Even when Leo’s code is excellent, it’s not flawless. It’s a good idea let a community be built around this module. NginX is very fast, concurrent and scales better than any other web server. I’m also needing this for a *very* large scale deployment. Will keep you posted on any bug I find, I’m 100% willing to help. Cheers!

Leave a Reply

Your email address will not be published.


*