PostgreSQL | this way lies madness

pgconf 2016.11.04 16:15:14

Since this year’s European PostgreSQL conference pgconf.eu happened to take place in my home country I just couldn’t pass the opportunity to go.
I’m not really a huge PostgreSQL user though — I mostly use it for personal projects, but have also done at least one rather unconventional proof-of-concept project on it at work (a largeish graph DB).

At work the main production databases that I’m currently responsible for (or have been in the past) are all based on MySQL/MariaDB, but this is mainly because these decisions were made 10+ years ago when the pros and cons of each choices were quite different than they are today. It’s rather likely that for all the new projects I would rather use PostgreSQL.

I won’t go over the talks one by one but rather share some general themes that I noticed.

Zeitgeist

There’s a German word Zeitgeist which loosely means the spirit of the time. I have begun to notice that at conferences there’s often some unifying subtopic of the project that is somewhat unproportionally important to the particular community at that time for some reason. For example at Europython 2010 the zeitgeist was all about concurrency and ways to get around the GIL. At Europython 2015 I barely heard anything about concurrency anymore, even though things at that front haven’t changed much in between — instead everyone was focused on data mining and scientific computing. So anyway I feel that at pgconf 2016 the central topic was replication. There were many talks exploring different approaches to replication and various master-slave switchover orchestration tools. It will be interesting to see what solution the community will settle on in the next couple of years.

Popularity

PostgreSQL has been gaining a lot of popularity over the recent years, probably mainly because of the uncertainty related to MySQL after Oracle bought it. This also means that many commercial companies see opportunities in selling support/consultancy around postgres related things. In general for-profit companies tend to be interested in having something to differentiate themselves from the competition so they are somewhat inherently motivated to create custom extensions and solutions instead of working together. It certainly felt at the times that each company at the conference had a different solution for handling replication and cluster orchestration each with its own up- and downsides. Let’s just hope it doesn’t end with a full scale Unix wars scenario. PostgreSQL has always had a rather rich landscape of forks so maybe they have already learned to handle this somehow.

Where to do the complex stuff?

There were several talks about some of the more powerful constructs and capabilities of PostgreSQL from window functions, recursive CTEs, lateral joins, upserts, aggregate filters to various nosql capabilities, custom datatypes, foreign tables, operator overloading and support for countless programming languages.

Which brings us to a rather classical dilemma: should we use various powerful tools that the DB provides and be tied down to it as a result, or use the DB as a simple datastore and do the complex stuff in the app? I see it as a continuum where on one end you only use simple queries (probably through ORM) and on the other end you have monstrosities like Oracle APEX where even the application itself is in the DB.

The keynote speaker from Adyen said that he believes that the decision to avoid procedures, triggers etc. was the best tech decision they did even though it was for completely different reasons. My experience is more or less the same – I think a good rule of thumb is to avoid non-declarative features but be rather liberal with everything else.

Case studies

For me the most interesting talks at the conference were various talks about real system setups and problems encountered along the way. There were several of these types of talks, starting with the keynote delivered by Michiel Toneman from Adyen which is a quickly growing payment processing company currently serving ~60 billion payments per year. They have been undergoing exponential growth for years which has led to some rather interesting scaling problems. Their master database is currently over 40TB and has 11 tables with a size over 1TB. Their largest table is currently around 11TB. Michiel talked about the reasoning and complexities around choosing PostgreSQL for a payment processing company, which is in a field usually dominated by high cost proprietary databases like Oracle and Sybase. It was interesting that postgres usage at Skype had been kind of a validation that probably made it a acceptable choice elsewhere.

Another interesting talk was about problems that Skype has encountered with PostgreSQL. The interesting part for me was that even though we use MySQL we still have encountered most of the same problems. That’s because many of these problems were just something that you encounter when running DB under serious load (lock queues, small degradations in IO performance having snowball effects, lagging read replicas, cleaning up bloat etc.)

Eile toimusid riigikogu valimised ja tegu oli juba teiste valimistega, kus valimiskommisioni leht mingil hetkel loobus värske info edastamisest.

Helmes, kes antud tarkvara teinud on tuli täna välja huvitava patuoinaga – jamades olevat süüdi avatud lähtekoodiga andmebaasimootor PostgreSQL, nende poolt tehtud tarkvara töötas perfektselt ja adekvaatset jõudlustesti ei tehtud kuna see olla võimatu.

Esiteks tundub siin äärmiselt kohatu PostgreSQLi süüdistamine, selle peal käib maailmas väga palju süsteeme, mille andmemahud ja koormused on võrratult suuremad sellest, mida see valimissüsteem oleks pidanud kannatama (no kasvõi näiteks Skypei kasutajate baas on PostgreSQLi peal). Mulle isiklikult tundub, et antud juhul oli Postgresi süüdistada lihtsalt palju mugavam, kui öelda, et me ei testinud ega seadistanud asja piisavalt, sest näiteks erinevalt Oraclest ei ole siin taga kedagi kes sind siinkohal laimu eest kohtusse kaebaks.

Teiseks öelda, et meie tarkvara töötas ideaalselt, ikaldus vahend X mida me kasutasime on üsna kohatu, kuna arendaja vastutab üldiselt ikka terviku eest. PostgreSQL on ennast maailmas piisavalt tõestanud, küsimus tundub olevat puhtalt rakenduse arhitektuuris ja/või serveri seadistustes. Siinkohal oleks tore kuulata Hannu Krosingu või mõne teise postgresi guru kommentaari.

No ja viimaseks jutt, et “Omalt poolt olime kõik ära testinud ja kontrollinud ning enam midagi teha ei saanud” – antud rakenduse testimine peaks täiesti reaalse koormuse juures olema üsna lihtne. Eesti oma ~600 000 häälega on ikka imepisike asi simuleerimiseks. Hiinlastel oleks ehk sutsu raskem 😛

Aga, et see ei jääks tühjaks targutamiseks, siis viskasin hommikul rongis tööle sõites kokku naiivse valimise rakenduse, et vaadata palju sellise baasi täitmine ja hilisem võitjate selgitamine sellise baasi pealt aega võtaks suvalisel desktop masinal.

Kõigepealt tuleb teha mõned eeldused:

Teen ainult häälte (votes) ja kandidaatide (candidates) tabelid. Tegelikult peaks tabeleid olema muidugi rohkem – valimisnimekirjad, erakonnad, ringkonnad, valimisjaoskonnad ja ilmselt veel mõned, mis esimese hooga pähe ei tule. Neid tabeleid võib aga rahus ignoreerida, kuna väljaarvatud häälte tabel peaks muu olema üsna konstantne ja eeltäidetud.

Teen eelduse, et iga hääl on eraldi kirje votes tabelis. Ilmselt praktikas nii ei ole ja pigem teatab valimisjaoskond häälte arvu ühe kirjena kandidaadi kohta a’la kandidaat_X sai 1000 häält. See oleks jõudluse mõttes oluliselt lihtsam, kuna 600 000 inserdi asemel oleks neid pigem kuskil 50 000 ringi. Teen sihilikult jõudluse mõttes oluliselt hullema variandi, et näha palju see aega võtaks.

Teen eelduse, et valimisjaoskond teatab kõik oma hääled korraga. St. iga hääle sisestamine ei ole omaette transaktsioon vaid pigem on seda kõigi ühe valimisjaoskonna häälte sisestamine.

Eeldan, et veebis kasutajale graafikute ja statsi näitamist ei tehta otse andmebaasi pealt vaid pigem genereeritakse staatiline leht näiteks kord minutis. Ei tundu olevat põhjust, miks peaks kasutajale näidatav leht üldse andmebaasi vastu käima ja ilmselt nii oligi tehtud sest probleemide ajal tuli leht endiselt kiirelt ette, lihtsalt vanade andmetega. See tähendab, et mul pole vaja emuleerida kuidas paarsada tuhat erinevat select päringut sekundis baasi pihta käivad.

Üldiselt nende selgitust lugedes jääb mulje, et küsimus oli selles et query planner tegi otsuseid vana tabeli statistika pealt (VACUUM ANALYZE’i polnud vahepeal käivitatud) mistõttu eelistati ebaefektiivsemat käivitusplaani. Näiteks, kui tabelis on mõnisada kirjet võib igati mõistlik olla kasutada tabeli käigi ridade läbikäimist (full scan) indeksi poole pöördumise asemel. Artiklist jääb mulje, et hetkel kui jama tekkis vaatasid adminid käimasolevaid päringuid ja nende execution plani ja andsid VACUUM ANALYZE ja siis ootasid tunnikese, et päringu käitusplaan muutuks. Selleks, peab andmebaasi IO ikka ülimalt ülekoormatud olema, et VACUUM ANALYZE sellise aja võtaks. Näiteks minu 600 000 kirjega häälte tabelil võttis tavalisel desktop masinal ~2s.

Schema niisiis selline:

CREATE TABLE candidates(
    candidate_id INT PRIMARY KEY, 
    name text
)
 
CREATE TABLE votes(
    electoral_district_id INT NOT NULL, 
    candidate_id INT NOT NULL REFERENCES candidates(candidate_id)
)
 
CREATE INDEX idx_candidate_id ON votes(candidate_id)

Ja script, mis “valimistulemusi” sisestab on siin.

Ja aega võtab sellega 600 000 hääle sisestamine veidi alla 4 minuti:

hadara@hadara-desktop:~$ python elections.py 
        candidates inserted
        votes inserted
tables filled in: 211.85s

Tegu niisiis tavalisel desktop masinal suht vaike seadistustega jooksva PostgreSQLiga (shared_buffers keeratud 256MB peale, reaalsetes serverites ilmselt pigem 4+GB).

Sellise baasi pealt võitjate pärimine võiks välja näha näiteks nii:

elections=# SELECT votes.candidate_id, COUNT(*) AS votecount,(SELECT name FROM candidates WHERE candidates.candidate_id=votes.candidate_id) AS candidate_name FROM votes GROUP BY votes.candidate_id ORDER BY votecount LIMIT 10; candidate_id | votecount | candidate_name 
--------------+-----------+----------------
          106 |       600 | candidate_106
          120 |       600 | candidate_120
          285 |       600 | candidate_285
          681 |       600 | candidate_681
          866 |       600 | candidate_866
          264 |       600 | candidate_264
          887 |       600 | candidate_887
          601 |       600 | candidate_601
          664 |       600 | candidate_664
          251 |       600 | candidate_251
(10 ROWS)

See päring võtab 146ms ja execution plan on selline:

elections=# EXPLAIN analyze SELECT votes.candidate_id, COUNT(*) AS votecount,(SELECT name FROM candidates WHERE candidates.candidate_id=votes.candidate_id) AS candidate_name FROM votes GROUP BY votes.candidate_id ORDER BY votecount DESC LIMIT 10;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 LIMIT  (cost=19956.81..19956.83 ROWS=10 width=4) (actual TIME=146.603..146.605 ROWS=10 loops=1)
   -&gt;  Sort  (cost=19956.81..19959.31 ROWS=1000 width=4) (actual TIME=146.602..146.603 ROWS=10 loops=1)
         Sort KEY: (COUNT(*))
         Sort Method:  top-N heapsort  Memory: 25kB
         -&gt;  HashAggregate  (cost=11655.00..19935.20 ROWS=1000 width=4) (actual TIME=144.690..146.393 ROWS=1000 loops=1)
               -&gt;  Seq Scan ON votes  (cost=0.00..8655.00 ROWS=600000 width=4) (actual TIME=0.005..36.190 ROWS=600000 loops=1)
               SubPlan 1
                 -&gt;  INDEX Scan USING candidates_pkey ON candidates  (cost=0.00..8.27 ROWS=1 width=13) (actual TIME=0.001..0.001 ROWS=1 loops=1000)
                       INDEX Cond: (candidate_id = $0)
 Total runtime: 146.651 ms
(10 ROWS)

Valimistega ma kuidagi seotud pole ja PostgreSQLi näpisin viimati umbes 8 aastat tagasi. Seega üsna puusalt tulistamine.

Uuendus: Martin Rebane, kellele on antud minust oluliselt rohkem kirjanikuannet, on ka samal teemal kirjutanud.

this way lies madness

Sven Petai's blog

Tag Archives: PostgreSQL

pgconf.eu 2016

Zeitgeist

Popularity

Where to do the complex stuff?

Case studies

valimised