XFS and EXT4 Testing Redux

In my concluded testing post, I declared EXT4 my winner vs XFS for my scenario. My coworker, @keyurdg, was unwilling to let XFS lose out and made a few observations:

  • XFS wasn’t *really* being formatted optimally for the RAID stripe size
  • XFS wasn’t being mounted with the inode64 option which means that all of the inodes are kept in the first 2TB. (Side note: inode64 option is default in newer kernels but not on CentOS 6’s 2.6.32)
  • Single threaded testing isn’t entirely accurate because although replication is single threaded, the writes are collected in InnoDB and then writes it to disk using multiple threads governed by innodb_write_io_threads.

Armed with new data, I have – for real – the last round of testing.

To keep things a bit simpler, I will be comparing each file system on 2TB and 27TB, with 4 threads, which matches the default value for innodb_write_io_threads in MySQL 5.5.

FS RAID Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 10 2T noatime,nodiratime,nobarrier,inode64 62.588Mb/sec 4005.66 0.88ms 0.03ms
ext4 10 2T noatime,nodiratime,nobarrier 58.667Mb/sec 3754.66 0.87ms 0.19ms
FS RAID Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 10 27T noatime,nodiratime,nobarrier,inode64 64.47Mb/sec 4126.06 0.84ms 0.02ms
ext4 10 27T noatime,nodiratime,nobarrier 49.379Mb/sec 3160.26 1.06ms 0.24ms

XFS finally wins out clearly over EXT4. XFS being dramatically slower on 27T earlier really shows how much the worse the performance between inode32 and inode64 is and explains why it was that much better on 2T. Fixing the formatting options pushed XFS over the top easily.

All that’s left to do is setup multiple instances until replication can’t keep up anymore.


One thought on “XFS and EXT4 Testing Redux

  1. Keyur Govande

    A bit of explanation on why inode64 provides a sweet sweet perf boost.

    XFS divides the available disk space into allocation groups (http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/Allocation_Groups.html) of equal size. So the 26TB disk is approximately split into 32 800GB pieces.

    In inode32 mode, every file in a directory is put into a different allocation group (modulo total number of allocation groups). One benefit of this is if the files keep on growing, they’ll have tons of room to do so without fragmentation. The downside is if you’re reading all the files in the directory, then you need to skip all over the disk to retrieve them.

    In inode64 mode, all files in a directory are in the same AG. This means more fragmentation (if the files continue to grow), but way way lesser seeking. And this is what gives the perf boost.

    NB: The downside of inode64 is the file system cannot be mounted via NFS.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s