Database consulting and application development...

...because your data is your business

March 3, 2017, 5:20 p.m.

Slides from my talk at SCALE15x

The slides from my SCALE 15x talk: Minimizing PostgreSQL Major Version Upgrade Downtime Using Slony are available here:

You'll want to download the powerpoint version as it includes some embedded screencasts that don't play on slideshare.

Posted by Jeff Frost
Nov. 10, 2015, 10:31 p.m.

wal-e - avoiding the cron emails

If you're like me, you use wal-e for PITR based backups on many of the PostgreSQL servers you manage.  While setting it up, you likely discovered the --terse option greatly reduces the log output.  However, if you run the backup-push job in cron and have email aliases set up properly, you probably receive emails like this every time the backup-push runs:

NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

Why is that?  Well, the default value for client_min_messages in PostgreSQL is NOTICE, which means the client (wal-e in this case) will be sent messsages of NOTICE and higher priority.

So, how to change that?  You may have gotten this far and realized there isn't a handy command line switch that wal-e exposes.  That's ok though, wal-e uses libpq, so we just need to use the PGOPTIONS environment variable.

If you're using wal-e with envdir like in the documentation examples, then you can just plunk down a PGOPTIONS file in /etc/wal-e.d/env that looks like this:

-c client_min_messages=WARNING

and that's it. If you're using a wrapper script for wal-e instead of envdir, you just need to export that environment variable before calling wal-e like this:

export PGOPTIONS="-c client_min_messages=WARNING"

Now sit back and enjoy a few less cron emails in the morning.

EDIT: The envdir method only works if you have bash as your cron shell SHELL=/bin/bash in the crontab or you call the command like this:

bash -c "envdir /etc/wal-e.d/env /usr/local/bin/wal-e --terse backup-push /var/lib/postgresql/9.4/main"

Posted by Jeff Frost
Sept. 18, 2014, 12:19 p.m.

WAL-E with Rackspace CloudFiles over Servicenet

Found a great walkthrough on setting up WAL-E to use python-swiftclient for storage in Rackspace Cloud Files:

Unfortunately by default, your backups use the public URL for Cloud Files and eat into metered public bandwidth.

The way to work around this is to set the endpoint_type to internalURL instead of the default publicURL.

You do that by setting the following environment variable:


That allows WAL-E to use Servicenet for base backups and WAL archiving which will be much faster and not eat into your metered public bandwidth.

Posted by Jeff Frost
April 29, 2014, 6:04 p.m.

How to start PostgreSQL the Debian way with pg_ctl

Sometimes I find myself wanting to start a PostgreSQL cluster with pg_ctl on a Debian or Ubuntu system.  If you've ever tried this, you know that something like this doesn't work:

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ sudo -u postgres /usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main/ start
server starting
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ postgres cannot access the server configuration file "/var/lib/postgresql/9.3/main/postgresql.conf": No such file or directory

because the Debian packages place the config files in /etc/postgresql/<version>/<cluster name>

So, I used to think that this was the way to start it:

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ sudo -u postgres /usr/lib/postgresql/9.3/bin/pg_ctl -D /etc/postgresql/9.3/main/ start
server starting
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ 2014-04-30 00:59:32 UTC LOG:  database system was shut down at 2014-04-30 00:57:49 UTC
2014-04-30 00:59:32 UTC LOG:  database system is ready to accept connections
2014-04-30 00:59:32 UTC LOG:  autovacuum launcher started

And that does generally work, but the proper way to do it is this:

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ sudo -u postgres /usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main/ -o "-c config_file=/etc/postgresql/9.3/main/postgresql.conf" start
server starting
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ 2014-04-30 01:00:22 UTC LOG:  database system was shut down at 2014-04-30 01:00:16 UTC
2014-04-30 01:00:22 UTC LOG:  database system is ready to accept connections
2014-04-30 01:00:22 UTC LOG:  autovacuum launcher started

The real Debian way is to use pg_ctlcluster like so:

vagrant@vagrant-ubuntu-trusty-64:~$ sudo -u postgres pg_ctlcluster 9.3 main start

Posted by Jeff Frost
April 18, 2014, 3:54 p.m.

Grabbing the statement durations out of a log file using gnu grep

Sometimes I find myself wanting to grep all the query durations out of a file for sorting.  I recently discovered that I can use gnu grep's lookbehind syntax like so:

grep -oP '(?<=duration: )[0-9\.]+' postgresql-Mon.csv

and then sort them numerically like so:

grep -oP '(?<=duration: )[0-9\.]+' postgresql-Mon.csv | sort -n | tail

Posted by Jeff Frost
Nov. 21, 2013, 7:42 p.m.

Ubuntu 3.11.0 kernel not as stable as hoped

As I mentioned in my previous post, Ubuntu recently released a 3.11.0 kernel for Precise.  Unfortunately, while it appeared stable during a long burn-in process with various pgbench workloads and scale sizes, it crashed shortly after putting the machine into service, so make sure you test thoroughly on your hardware. 

This particular hardware was the Dell Poweredge 720 that rackspace uses.

Guess we're going back to custom compiling kernels for a little while longer.

Anyone else have experiences good or bad with it?

I believe this is the same kernel that's shipping with 13.10 Saucy Salamander.

Posted by Jeff Frost
Nov. 15, 2013, 1:34 p.m.

Ubuntu has released a 3.11.0 kernel for 12.04 LTS!

This one is for those of you compiling a custom kernel or staying on 10.04 LTS because of this issue on database servers in the 3.x kernels:

Ubuntu has released linux-image-3.11.0-13-generic for 12.04 in which that fix is included.  It's actually been included since 3.9rc2, but Ubuntu skipped straight to 3.11.

And there was much rejoicing in the PostgreSQL community.  At least from those of us who manage 12.04 LTS PostgreSQL servers!

For those of you who don't recall, here's a link to the big discussion thread:

One item of note on 3.11 is that the kernel.sched_migration_cost setting is now called kernel.sched_migration_cost_ns, so if you are setting that via sysctl, you will need to update your sysctl.conf or sysctl.d file after upgrade.

Now go upgrade your test servers and let us know how it performs!  I've been burning in a new server for about 24 hours with pgbench and the 3.11.0-13-generic kernel and so far it's been stable and a good performer.

Posted by Jeff Frost
Aug. 10, 2013, 6:41 p.m.

Get your PgBouncer pools stats into Ganglia

Just whipped up a quick ruby script to put PgBouncer "show pools" data into ganglia using gmetric.  Based it heavily off the postgres_gmetric script.

You can find the pgbouncer gmetric script here:

Posted by Jeff Frost
Feb. 27, 2013, 5:38 p.m.

Script to add the respository on Debian/Ubuntu

You've probably heard the announcement: Martin Pitt's PPA is being deprecated.  This is in an effort to push folks towards the new official

See the announcement here:

Also, see the deprecation notice on Martin's PPA:

Since we have quite a few customers using Martin's PPA for PostgreSQL packages under Ubuntu, I wrote a little shell script to switch repositories if you're using Martin's PPA and just install the repository if you are not.

You can find it here:

Posted by Jeff Frost
Feb. 14, 2013, 4:24 p.m.

Finding unreplicated tables and sequences in slony with check_postgres and nagios

Since slony doesn't automatically add tables to replication for you, sometimes it's handy to double check that everything is being replicated.  If your slony cluster is named "sample" then you would have a slony schema named "_sample". As such, you can query against the sl_table and sl_sequence tables to find unreplicated tables like so:

SELECT pg_class.OID::regclass as table_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'r' AND pg_class.OID NOT IN (SELECT tab_reloid FROM _sample.sl_table) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample')

SELECT pg_class.OID::regclass as sequence_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'S' AND pg_class.OID NOT IN (SELECT seq_reloid FROM _sample.sl_sequence) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample');

 Those queries are a bit unpleasant to remember, so I like to make them into views like so:

CREATE OR REPLACE VIEW public.unreplicated_tables AS SELECT pg_class.OID::regclass as table_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'r' AND pg_class.OID NOT IN (SELECT tab_reloid FROM _sample.sl_table) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample') AND pg_class.relname <> 'fwd_epc_map';

CREATE OR REPLACE VIEW public.unreplicated_sequences AS SELECT pg_class.OID::regclass as sequence_name FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.OID WHERE relkind = 'S' AND pg_class.OID NOT IN (SELECT seq_reloid FROM _sample.sl_sequence) AND pg_namespace.nspname NOT IN ('information_schema', 'pg_catalog', '_sample');

Now, that's a little handier because we can just select straight out of the view.  Now that we have these easy to use views, we can use's custom_query check to setup a nagios alert whenever there are unreplicated tables or sequences.

First we need to create the check commands like so:

define command{
        command_name    check_pgsql_slony_unreplicated_tables
        command_line    /usr/lib64/nagios/plugins/ --host $HOSTADDRESS$ --port=5432 --dbuser nagios --dbname mydb --action=custom_query --query="SELECT count(table_name) AS result FROM public.unreplicated_tables" --valtype=integer --warn=1 --critical=5

define command{
        command_name    check_pgsql_slony_unreplicated_sequences
        command_line    /usr/lib64/nagios/plugins/ --host $HOSTADDRESS$ --port=5432 --dbuser nagios --dbname mydb --action=custom_query --query="SELECT count(sequence_name) AS result FROM public.unreplicated_sequences" --valtype=integer --warn=1 --critical=5
I apologize for all the wrapping there.

Now that we have the commands defined, we use them in a service definition:

define service {
    use                          postgres-service
    notifications_enabled        1
    hostgroup_name               slony_primary
    service_description          Slony Unreplicated Tables
    check_command                check_pgsql_slony_unreplicated_tables
    contact_groups               pgexperts

define service {
    use                          postgres-service
    notifications_enabled        1
    hostgroup_name               slony_primary
    service_description          Slony Unreplicated Sequences
    check_command                check_pgsql_slony_unreplicated_sequences
    contact_groups               pgexperts
And that's it! Now we'll receive an alert whenever an unreplicated table or sequence exists on the servers in hostgroup slony_primary.

Posted by Jeff Frost
Feb. 12, 2013, 2:48 p.m.

Russian Doll Syndrome / Please Monitor Your RAID Volumes

As part of setting up nagios monitoring for customers, I always make sure that the RAID volumes are being monitored on the DB servers.  More often than not, I have to tell the customer there are bad disks in one or more of the RAID volumes.  Don't be one of those customers, make sure you monitor the RAID volumes on your PostgreSQL (and other) servers!

So, today I was setting up some of this same monitoring for LSI RAID controllers in a Dell server using the check_sasraid_megaraid nagios plugin.   This plugin requires the LSI SNMP installer as well as net-snmp and a few other miscellaneous packages.

This brings us to the Russian Doll Syndrome.  I download the zip file from the LSI support site and unzip it to find a zip full of zips:

$ unzip

Ok, fine, they have a zip for each architecture, so let's unzip the Linux_x64 one and get on with this:

$ unzip
  inflating: MD5CheckSum.txt        

Oh, sweet, another zip file!

$ unzip
  inflating: readme.txt             
 extracting: sas_snmp_64bit.tar.gz  

And inside that, we find a tarball:

$ tar xvfz sas_snmp_64bit.tar.gz

And finally an RPM.  Yeesh LSI, why couldn't we just have the RPM right inside the file or even just in the top level zip file?

So, all kidding aside, please make sure you're monitoring the RAID volumes on your PostgreSQL servers!

Posted by Jeff Frost
Aug. 23, 2012, 5:57 p.m.

PostgreSQL, NUMA and zone reclaim mode on linux

Lately we've been seeing issues with zone reclaim mode on large memory multi processor NUMA linux systems.

What's NUMA?  It's just an acronym for Non-Uniform Memory Access.  This means that some memory in your system is more expensive for a particular CPU to access than its "local" memory.

You can see how much more distant the kernel considers the different zones by using the numactl command like so:

numactl --hardware
If you've got a modern multiprocessor system, you'll probably see something like this:

available: 2 nodes (0-1)
node 0 size: 48417 MB
node 0 free: 219 MB
node 1 size: 48480 MB
node 1 free: 135 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10 

Here we see a distance of 10 for node 0 to access node 0 memory and 21 for node 0 to access node 1 memory.   What does distance really mean?  It's a cost parameter based on number of "hops" or buses that separate the node from the distant memory.

Now, what's zone reclaim mode?

From the linux kernel docs:


Zone_reclaim_mode allows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no
zone reclaim occurs. Allocations will be satisfied from other zones / nodes
in the system.

This is value ORed together of

1       = Zone reclaim on
2       = Zone reclaim writes dirty pages out
4       = Zone reclaim swaps pages

zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
page allocator will then reclaim easily reusable pages (those page
cache pages that are currently not used) before allocating off node pages.

It may be beneficial to switch off zone reclaim if the system is
used for a file server and all of memory should be used for caching files
from disk. In that case the caching effect is more important than
data locality.

Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively
throttle the process. This may decrease the performance of a single process
since it cannot use all of system memory to buffer the outgoing writes
anymore but it preserve the memory on other nodes so that the performance
of other processes running on other nodes will not be affected.

Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset

I highlighted the text above because PostgreSQL depends heavily on the filesystem cache and disabling zone reclaim mode is desirable in this situation.

There's been a bit of discussion about this on the pgsql-performance mailing list here:

If you've got a modern multi-socket system, odds are good that zone reclaim mode is enabled automatically on boot.  You can check this by looking at /proc/sys/vm/zone_reclaim_mode.

The biggest issue we've seen with zone reclaim mode enabled on customer multi-socket systems is the filesystem cache never filling up even when the database is much larger than RAM.  That's because the system is trying to keep some "local" memory available. After disabling zone_reclaim_mode, the filesystem cache fills up and performance improves.

So, how to disable zone_reclaim_mode?  The best way to do this is via sysctl.  Just add:

vm.zone_reclaim_mode = 0 

to /etc/sysctl.conf, save it and execute sysctl -p to load the new settings into the kernel.

Other interesting non PostgreSQL pages on NUMA/zone_reclaim_mode:

Posted by Jeff Frost
Aug. 2, 2012, 10:16 a.m.

Finding non-UTF8 data in a dump

Recently I was tasked with identifying all rows in a PostgreSQL database that contained non UTF8 data for migrating from SQL_ASCII encoding to UTF8.  Because of the database's size, just using iconv and diff on a dump file wouldn't work.

I started with this perl script from Jon Jensen:

It got me a little way down the road, but what I really wanted was to know which table had the bad row and better yet, be able to easily import the bad rows with a COPY command for cleanup and eventual use as temporary tables that we can be used with UPDATE statements to fix the production DB.

So, I turned to Python.  I found a isUTF8(text) definition here:

I took that and wrote which outputs COPY commands with only the non-UTF8 rows to STDOUT and also outputs a schema-only dump command to STDERR which only dumps the tables with bad data for ease of creating a temporary DB.

You can find the script on github here:

What other methods do you folks use?

Posted by Jeff Frost
May 23, 2012, 7:05 p.m. script to dump all PostgreSQL DBs

For a long time, I've written all my automation scripts in bourne shell and bash with occasional forays into perl.  I really like bash because you're always guaranteed to have bash installed on a modern linux system and so don't need to install perl and 15 modules to get the job done.

That said, I've recently begun using python for more automation.  As an exercise, I've rewritten my script for python 2.7 trying to use only the standard library.

The script pg_dumps all the databases in a PostgreSQL cluster in the custom format allowing you to specify the compression level.  (See this blog post regarding pg_dump compression level performance.)  It also dumps the globals to a bzip2'd SQL file and then removes files in the dump destination directory that are older than the --keep value (default 7 days).  Finally, if it had any errors during the run, it sends the error log file to the specified email address.  It also picks up libpq environment variables, but will override those with anything specified on the command line.

You can find it here:

Posted by Jeff Frost
May 15, 2012, 2:47 p.m.

Slony Upstart Script for Ubuntu 10.04 LTS

 Since Ubuntu is moving towards using upstart instead of SysV init scripts, I thought I'd write a quick upstart script for starting slony compiled and installed in /usr/local/slony.  Upstart 1.4+ supports setuid, but the version of upstart included with Ubuntu 10.04LTS is only 0.6.5 and even on Ubuntu 11.10, it's only 1.3, so instead I use start-stop-daemon's --chuid switch.  The script should be installed as /etc/init/slony.conf. Here's the script:

---cut here---
# slony - slon daemon
# slon is the slony replication daemon

description    "slony slon daemon"

start on net-device-up
stop on runlevel [06]

# can't use setuid until upstart 1.4
#setuid slony
# which is why we use start-stop-daemon with the --chuid switch


    exec start-stop-daemon --start --chuid slony --exec /usr/local/slony/bin/slon -- -f /etc/slon.conf >> /var/log/slony/slony.log 2>&1
end script
---cut here---

This will have the slon daemon run as the slony user and log to /var/log/slony/slony.log.  You'll also need a valid slony config file in /etc/slon.conf.

You can also find the script in the pgexperts upstart-scripts git repo:

Posted by Jeff Frost
April 2, 2012, 5:58 p.m.

Using the row_number() window function to reorder images

One of my customers uses the rails acts_as_list module for maintaining order of images on forms.

Through some application issues they ended up with images that had identical order for the same provider and image_category. So, it ended up looking like this:

select provider_id, image_category_id, position, id from images where image_category_id=1234 order by position;


provider_id | image_category_id | position | id

       9999 |              1234 |        3 | 484900
       9999 |              1234 |        4 | 484899
       9999 |              1234 |        4 | 484897
       9999 |              1234 |        4 | 484896
       9999 |              1234 |        4 | 484894
       9999 |              1234 |        4 | 484893
       9999 |              1234 |        4 | 484892
       9999 |              1234 |        4 | 484890
       9999 |              1234 |        5 | 484889
       9999 |              1234 |        6 | 484887
       9999 |              1234 |        7 | 484886
       9999 |              1234 |        8 | 479818
       9999 |              1234 |        9 | 479817
       9999 |              1234 |       10 | 479815

Note that there are several rows with position=4 above. This was a prevalent issue throughout the images table and not just with this example provider.

So, what they wanted was to reorder the images such that the equal positions were kept together and the overall order stayed the same.  So, for example, it should look like this (sorry for the small font):

 provider_id |image_category_id | position |   id  

        9999 |             1234 |        3 | 484900  ==> Position 1
        9999 |             1234 |        4 | 484899  ==> Position 2
        9999 |             1234 |        4 | 484897  ==> Position 3
        9999 |             1234 |        4 | 484896  ==> Position 4
        9999 |             1234 |        4 | 484894  ==> Position 5
        9999 |             1234 |        4 | 484893  ==> Position 6
        9999 |             1234 |        4 | 484892  ==> Position 7
        9999 |             1234 |        4 | 484890  ==> Position 8
        9999 |             1234 |        5 | 484889  ==> Position 9
        9999 |             1234 |        6 | 484887  ==> Position 10
        9999 |             1234 |        7 | 484886  ==> Position 11
        9999 |             1234 |        8 | 479818  ==> Position 12
        9999 |             1234 |        9 | 479817  ==> Position 13
        9999 |             1234 |       10 | 479815  ==> Position 14

So, I thought about it for a bit, then it occurred to me: why not use the row_number() window function.  So I wrote the following migration SQL script:

CREATE temp TABLE new_image_positions 
, row_number AS position
SELECT provider_type
, provider_id
, image_category_id
, position
, id
, row_number()
partition BY provider_id
, image_category_id
ORDER BY position
FROM images
WHERE provider_type = 'Project'
AND position IS NOT NULL
) x ;

UPDATE images
SET position = n.position
FROM new_image_positions n


And, voila, all the images are now properly ordered:

 provider_id | image_category_id | position |   id   
        9999 |              1234 |        1 | 484900
        9999 |              1234 |        2 | 484899
        9999 |              1234 |        3 | 484897
        9999 |              1234 |        4 | 484896
        9999 |              1234 |        5 | 484894
        9999 |              1234 |        6 | 484893
        9999 |              1234 |        7 | 484892
        9999 |              1234 |        8 | 484890
        9999 |              1234 |        9 | 484889
        9999 |              1234 |       10 | 484887
        9999 |              1234 |       11 | 484886
        9999 |              1234 |       12 | 479818
        9999 |              1234 |       13 | 479817
        9999 |              1234 |       14 | 479815
        9999 |              1234 |       46 | 403052
        9999 |              1234 |       47 | 403051
        9999 |              1234 |       48 | 403050
        9999 |              1234 |       49 | 403049
        9999 |              1235 |        1 | 484916
        9999 |              1235 |        2 | 484915
        9999 |              1235 |        3 | 484912
        9999 |              1235 |        4 | 484910
        9999 |              1236 |        1 | 493000
        9999 |              1236 |        2 | 492999

Posted by Jeff Frost
Dec. 23, 2011, 10:22 a.m.

pg_dump compression settings

After doing the base backup benchmarks, I thought it would be interesting to benchmark pg_dump run locally and remotely using all the different compression settings.  This will allow me to compare the dump times and the space savings as well as see how much dumping remotely slows the process.  Nobody is storing their dumps on the same server, right?

For the record, all these dumps used pg_dump's custom dump format (-Fc), the PGDATA directory is 93GB on disk and resides on a 4 disk RAID10 of 7200 RPM SATA drives.

CPU on the test server: Intel(R) Xeon(R) CPU X3330  @ 2.66GHz.
CPU on the test remote client: Intel(R) Core(TM) i7 CPU 960  @ 3.20GHz.
Network is gigabit.
The total size reported is from du -sh output.
Both client and test server are running postgresql-9.1.2.

For comparison, our best base backup time from the base backup blog post was: 15m52.221s

The results:

Local pg_dump:
compression: 0
time: 19m7.455s
total size: 77G
compression: 1
time: 21m53.128s
total size: 11G
compression: 2
time: 22m27.507s
total size: 11G
compression: 3
time: 24m18.966s
total size: 9.8G
compression: 4
time: 30m10.815s
total size: 9.2G
compression: 5
time: 34m26.119s
total size: 8.3G
compression: 6
time: 41m35.340s
total size: 8.0G
compression: 7
time: 49m4.484s
total size: 7.9G
compression: 8
time: 91m28.689s
total size: 7.8G
compression: 9
time: 103m24.883s
total size: 7.8G

Remote pg_dump:
compression: 0
time: 20m1.363s
total size: 77G
compression: 1
time: 22m9.205s
total size: 11G
compression: 2
time: 22m19.158s
total size: 11G
compression: 3
time: 23m7.426s
total size: 9.8G
compression: 4
time: 26m10.383s
total size: 9.2G
compression: 5
time: 28m57.431s
total size: 8.3G
compression: 6
time: 33m23.939s
total size: 8.0G
compression: 7
time: 38m11.321s
total size: 7.9G
compression: 8
time: 62m27.385s
total size: 7.8G
compression: 9
time: 72m5.123s
total size: 7.8G

So, a few interesting observations:

  • Base backups are indeed faster, but that's no surprise since they have less overhead. 
  • Taking the backup locally is only slightly faster than remotely for the smaller compression levels
  • At compression level 3 and above the faster CPU on the test client becomes a huge benefit
  • Looks like the default compression is level 6. This might not be what you want if you're concerned about minimizing the backup time and have the disk space to spare.

Posted by Jeff Frost
Dec. 6, 2011, 3:45 p.m.

PostgreSQL Base Backup Benchmark Results

For a while now we've sort of anecdotally thought that compression should help with base backups and also that the rsync server is faster than rsync over ssh, so while nothing was happening on our test server, I thought I'd benchmark a few different methods.

Looks like rsync using an rsync server with no compression was fastest, but not by much. Pg_basebackup is right there with rsync using any of the three RC4 ciphers all within a few seconds of one another. Looks like blowfish-cbc comes in next followed by aes128-cbc.

Most interesting is that adding compression quadruples the sync time! That's probably not the case over a slow link, but over gigabit it's definitely the case.

Tests were done with a 93GB PGDATA, gigabit network, linux software 4 disk RAID10 data dir on the first test server copied to linux software 2 disk RAID0 on the second server.

Following are the results:

Without compression:

time rsync test0::data /data/tmp/test-data/ -avP
real 15m52.221s
user 3m10.257s
sys 4m11.460s

time /usr/pgsql-9.1/bin/pg_basebackup -D /data/tmp/test-data -c fast -h test0 -U postgres
real 16m13.440s
user 0m17.314s
sys 2m18.217s

time rsync -e 'ssh -c arcfour' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 16m13.242s
user 8m32.674s
sys 5m28.312s

time rsync -e 'ssh -c arcfour256' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 16m13.656s
user 8m48.356s
sys 6m7.343s

time rsync -e 'ssh -c arcfour128' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 16m3.947s
user 8m44.627s
sys 6m7.499s

time rsync -e 'ssh -c blowfish-cbc' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 26m22.516s
user 19m16.038s
sys 8m45.046s

time rsync -e 'ssh -c aes128-cbc' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 29m30.785s
user 27m31.919s
sys 6m5.154s

time rsync -e 'ssh' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 30m50.301s
user 24m10.951s
sys 9m13.090s

time rsync -e 'ssh -c aes256-ctr' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avP
real 38m23.789s
user 31m36.011s
sys 8m39.210s

Our Best Performers With Compression:

time rsync -e 'ssh -c arcfour' postgres@test0:/data/9.1/data/ /data/tmp/test-data/ -avPz
real 62m31.933s
user 14m48.752s
sys 2m42.412s

time rsync test0::data /data/tmp/test-data/ -avPz
real 62m53.699s
user 6m42.846s
sys 1m44.357s

Since Rod, Joshua and Rodrigo's comments suggested pigz and netcat below, I thought I'd give them both a quick try to see if they produced a winner.  Conveniently, gnu-tar has a --use-compress-program option, so you can easily use pigz in this manner.

So, tar over ssh with the arcfour256 cipher using pigz for compression:

time ssh -c arcfour256 test0 "cd /data/9.1/data ; tar --use-compress-program=pigz -cf - ." | (cd /data/tmp/test-data ; tar --use-compress-program=pigz -xvf -)

 real    18m11.901s
user    7m3.587s
sys     3m53.487s

On test0: tar --use-compress-program=pigz -cf - . | nc -l 7000
On test1: time nc test0 7000 | tar --use-compress-program=pigz -xvf -

real    17m46.805s
user    6m17.562s
sys     4m4.563s

So, using tar with pigz over ssh and over netcat are much faster compression methods, they still are a little slower than no compression.  This got me wondering if perhaps the 93GB of PGDATA doesn't compress all that well:

14G     /data/tmp/data-test.tgz

So, that's 6:1 compression. Not bad!  Interestingly, it took gnu-tar + pigz about 17m to create the test tgz file. That's about identical to the time it took to ship it over the network and uncompress/untar it on the other end.

That made me try tar with no compression piped over netcat:

On test0: tar -cf - . | nc -l 7000

On test1: time nc test0 7000 | tar -xvf -

real    15m52.313s
user    0m10.033s
sys     3m17.630s

So, tar with no compression over netcat clocked in almost identical to using rsync's native server.

Anyone have any other tools for me to test?

Posted by Jeff Frost