Since this year’s European PostgreSQL conference pgconf.eu happened to take place in my home country I just couldn’t pass the opportunity to go.
I’m not really a huge PostgreSQL user though — I mostly use it for personal projects, but have also done at least one rather unconventional proof-of-concept project on it at work (a largeish graph DB).
At work the main production databases that I’m currently responsible for (or have been in the past) are all based on MySQL/MariaDB, but this is mainly because these decisions were made 10+ years ago when the pros and cons of each choices were quite different than they are today. It’s rather likely that for all the new projects I would rather use PostgreSQL.
I won’t go over the talks one by one but rather share some general themes that I noticed.
Zeitgeist
There’s a German word Zeitgeist which loosely means the spirit of the time. I have begun to notice that at conferences there’s often some unifying subtopic of the project that is somewhat unproportionally important to the particular community at that time for some reason. For example at Europython 2010 the zeitgeist was all about concurrency and ways to get around the GIL. At Europython 2015 I barely heard anything about concurrency anymore, even though things at that front haven’t changed much in between — instead everyone was focused on data mining and scientific computing. So anyway I feel that at pgconf 2016 the central topic was replication. There were many talks exploring different approaches to replication and various master-slave switchover orchestration tools. It will be interesting to see what solution the community will settle on in the next couple of years.
Popularity
PostgreSQL has been gaining a lot of popularity over the recent years, probably mainly because of the uncertainty related to MySQL after Oracle bought it. This also means that many commercial companies see opportunities in selling support/consultancy around postgres related things. In general for-profit companies tend to be interested in having something to differentiate themselves from the competition so they are somewhat inherently motivated to create custom extensions and solutions instead of working together. It certainly felt at the times that each company at the conference had a different solution for handling replication and cluster orchestration each with its own up- and downsides. Let’s just hope it doesn’t end with a full scale Unix wars scenario. PostgreSQL has always had a rather rich landscape of forks so maybe they have already learned to handle this somehow.
Where to do the complex stuff?
There were several talks about some of the more powerful constructs and capabilities of PostgreSQL from window functions, recursive CTEs, lateral joins, upserts, aggregate filters to various nosql capabilities, custom datatypes, foreign tables, operator overloading and support for countless programming languages.
Which brings us to a rather classical dilemma: should we use various powerful tools that the DB provides and be tied down to it as a result, or use the DB as a simple datastore and do the complex stuff in the app? I see it as a continuum where on one end you only use simple queries (probably through ORM) and on the other end you have monstrosities like Oracle APEX where even the application itself is in the DB.
The keynote speaker from Adyen said that he believes that the decision to avoid procedures, triggers etc. was the best tech decision they did even though it was for completely different reasons. My experience is more or less the same – I think a good rule of thumb is to avoid non-declarative features but be rather liberal with everything else.
Case studies
For me the most interesting talks at the conference were various talks about real system setups and problems encountered along the way. There were several of these types of talks, starting with the keynote delivered by Michiel Toneman from Adyen which is a quickly growing payment processing company currently serving ~60 billion payments per year. They have been undergoing exponential growth for years which has led to some rather interesting scaling problems. Their master database is currently over 40TB and has 11 tables with a size over 1TB. Their largest table is currently around 11TB. Michiel talked about the reasoning and complexities around choosing PostgreSQL for a payment processing company, which is in a field usually dominated by high cost proprietary databases like Oracle and Sybase. It was interesting that postgres usage at Skype had been kind of a validation that probably made it a acceptable choice elsewhere.
Another interesting talk was about problems that Skype has encountered with PostgreSQL. The interesting part for me was that even though we use MySQL we still have encountered most of the same problems. That’s because many of these problems were just something that you encounter when running DB under serious load (lock queues, small degradations in IO performance having snowball effects, lagging read replicas, cleaning up bloat etc.)