XFS and EXT4 Testing Concluded

I had a few more suggestions thrown out at me before I could wrap this one up.

  • Try disabling the RAID controller read-ahead
  • Try a few custom options to XFS
  • Try RAID-10

First, my final “best” state benchmarks for comparison:

FS  Raid Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 6 4T noatime,nodiratime,nobarrier 28.597Mb/sec 1830.24 0.51ms 2.06ms
ext4 6 4T noatime,nodiratime,nobarrier 32.583Mb/sec 2085.33 0.46ms 1.89ms

Disabling the read-ahead was an interesting thought.

FS RAID Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 6 4T noatime,nodiratime,nobarrier 28.704Mb/sec 1837.07 0.50ms 2.04ms
ext4 6 4T noatime,nodiratime,nobarrier 32.715Mb/sec 2093.75 0.46ms 1.88ms

It didn’t seem to make any real difference however.

The second suggestion was to use modified XFS options (mkfs.xfs -f -d sunit=128,swidth=$((512*8)),agcount=32 /dev/sdb2).

FS RAID Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 6 4T noatime,nodiratime,nobarrier 26.376Mb/sec 1688.07 0.55ms 2.18ms

It’s hard to tell, but it seems these actually degraded performance.

The last test was to switch to RAID-10. This would reduce overall storage capacity to 72TB, but given our requirements, this really shouldn’t cause any problem for the project. RAID-10 should have a significant boost to write performance.

FS RAID Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 10 36T noatime,nodiratime,nobarrier 32.808Mb/sec 2099.72 0.46ms 1.80ms
ext4 10 36T noatime,nodiratime,nobarrier 54.112Mb/sec 3463.17 0.28ms 1.11ms

These numbers back up the improvement to write speed, but XFS still lags behind at larger volume sizes.

Since I am had to reconfigure the array, I wanted to try the larger volume size (36T) above and then a smaller size (2T) to try to reproduce my earlier results showing XFS to perform better at lower volume size.

FS RAID Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 10 2.2T noatime,nodiratime,nobarrier 60.066Mb/sec 3844.2 0.25ms 1.00ms
ext4 10 2.2T noatime,nodiratime,nobarrier 64.766Mb/sec 4145.01 0.23ms 0.90ms

This was by far the best test results I had seen and has doubled the results from the original async test.

Testing conclusions

  • XFS seems to be very sensitive to partition size
  • In all but one case, EXT4 performed better on the random read-write tests
  • Know your other caveats of both file systems before picking the one for you

More EXT4 vs XFS IO Testing

Following my previous post, I got some excellent feedback in the forms of comments, tweets and other chat. In no particular order:

  • Commenter Tibi noted that ensuring I’m mounting with noatime, nodiratime and nobarrier should all improve performance.
  • Commenter benbradley pointed out a missing flag on some of my sysbench tests which will necessitate re-testing.
  • Former co-worker @preston4tw suggests looking at different IO schedulers. For all tests past, I used deadline which seems to be best, but re-testing with noop could be useful.
  • Fellow DBA @kormoc encouraged me to try many smaller partitions to limit the number of concurrent fsyncs.

There seem to be plenty of options here that should allow me to re-try my testing with a slightly more consistent method. The consistent difference seems to be in the file system, EXT4 vs XFS, with XFS performing at about half the speed of EXT4.

Continue reading

IO, IO, It’s Off to Testing We Go

In my last post, I learned in disappointing fashion that sometimes you need to start small and work your way up, rather than trying to put together a finished product. This go-round, I’ll talk about my investigation into disk IO.

In an effort to better understand the hardware I have and it’s capacities, I started off by just trying to get some basic info about the RAID controller and the disks. This hardware in particular is a Supermicro, with a yet unknown RAID controller and 16 4TB disks arranged in RAID 6. Finding out more disk and controller information was the first step. “hdparm -i” wasn’t able to give me much, nor was “cat /sys/class/block/sdb/device/{model,vendor}”. “dmesg” got me to a list of hard disks, Hitatchi 7200rpm and a model number that I could Google. It also got me enough controller information to point to megaraid, which is LSI, which got me over to this MegaCli cheat sheet. Using “MegaCli -AdpAllInfo -aALL” actually got me a great deal of information. (In other news, I now think that Dell’s OMSA command line utility is a lot less terrible after trying to figure out MegaCli).

Continue reading

Even If You Fail, You Can Still Learn

As many learning experiences do, this one also starts out “So I was working on a project at work and…”.  In this case, the end result is to try to run as many concurrent copies of MySQL on a single server as possible, maintaining real time replication each running differing data sets. To help with this, I sent out to do this on a server with 36 7200rpm 4GB SATA disks, giving me roughly 120TB of available space to work with.

This isn’t an abnormal type of machine for us. Sometimes you simply need a ton of disk space. There is a quirk with this particular machine that I’ve been told: the RAID controller has some issues with addressing very large virtual disks and I should create 2 60TB volumes and stitch them together with LVM. Easy enough: pvcreate both volumes, create a volume group and a logical volume out of it and viola: ~116TB of storage on a single mount point, with xfs as the file system (default options).

Continue reading

Learning to Deal With Learning

Note: This post originally appeared as a post on my former employers site (inside.godaddy.com), and has since been removed. Reposting here to share the information.

We here at GoDaddy deploy our MySQL database servers with RAID 10 for performance and reliability. Supporting that, we utilize hardware RAID option with Dell branded PERC cards. These cards offer a write back cache to boost write performance. Writes are stored in memory on the RAID controller and then flushed to disk in order to improve performance. This provides a noticeable improvement in writes because from the OS perspective, a write is complete when it hits the cache, not the actual disk. Since data in the cache is volatile, that is, susceptible to power loss, there is also a battery that allows the cache to be preserved in the event of a power loss. This eliminates the possibility of data loss while preserving the speed benefits of a write cache.

Continue reading

A Smattering of Percona Live 2014 Stuff

A real fast list of stuff from the Percona Live 2014 event.

Yahoo’s Performance Analyzer

Yahoo is developing a MySQL performance analyzer that should be released as open source later this year. From the demo, it looks like it pulls in most of the MySQL metrics, shows you a processlist and then lets you drill into a processlist with explain. Will have to keep my eye out for this.

ChatOps with Hubot

GitHub’s Sam Lambert has a set of hubot chatops scripts for MySQL. I was already looking at depoying hubot for the ability to push messages from a remote source into an IRC channel, so this would be a natural fit. He also mentioned using Janky to tie CI with hubot.

Continue reading

Running ElasticSearch, LogStash and Kibana in Docker

As any server farm scales out, it becomes increasingly difficult to Watch All The Things™. I’ve been watching the progress of LogStash+ElasticSearch+Kibana (also known as an ELK stack) for a while and gave it a go this weekend. The trick for me was wanting to run each element inside of a separate Docker container so that I have easily portable elements to scale out with.

A step back. What is Docker? Docker is a container (using LXC) around an application. In short, you install Docker, start a container using a base image (CentOS, Ubuntu, etc.) and then run the container, dropping you into a shell. From here, you configure your application, then save your container. You can stop and start it at any time, relocate it to another server, or generally break it as badly as you want and you’ve done absolutely nothing to your host machine.

ElasticSearch is a data store and search tool for data. It will serve as the place for our logs. LogStash is a log parser. It understands what the source format is and has many output formats (including ElasticSearch). Kibana is a data visualization tool for searching your data store and drawing graphs to help see what’s going on.

Continue reading