Downsizing to SSDs

4 Replies

System management can be a big deal. At Etsy, we DBAs have been feeling the pain of getting spread too thin. You get a nice glibc vulnerability and have to patch and reboot hundreds of servers. There goes your plans for the week.

We decided last year to embark on a 2016 mission to get better performance, easier management and reduced power utilization through a farm reduction in server count for our user generated, sharded data.

Continue reading →

Source of Truth or Source of Madness?

KeyError: ‘/dev/sda’

1 Reply

At Etsy, we have a nice, clean, streamlined build process. We have a command for setting up RAID, and another for OS installation. OS installation comes with automagic for LDAP, Chef roles, etc.

We came across an odd scenario today when a co-worker was building a box that gave the following error:

Traceback (most recent call first):

File “/usr/lib/anaconda/storage/partitioning.py”, line 1066, in allocatePartitions

disklabel = disklabels[_disk.path]

File “/usr/lib/anaconda/storage/partitioning.py”, line 977, in doPartitioning

allocatePartitions(storage, disks, partitions, free)

File “/usr/lib/anaconda/storage/partitioning.py”, line 274, in doAutoPartition

exclusiveDisks=exclusiveDisks)

File “/usr/lib/anaconda/dispatch.py”, line 210, in moveStep

rc = stepFunc(self.anaconda)

File “/usr/lib/anaconda/dispatch.py”, line 126, in gotoNext

self.moveStep()

File “/usr/lib/anaconda/dispatch.py”, line 233, in currentStep

self.gotoNext()

File “/usr/lib/anaconda/text.py”, line 602, in run

(step, instance) = anaconda.dispatch.currentStep()

File “/usr/bin/anaconda”, line 1131, in <module>

anaconda.intf.run(anaconda)

KeyError: ‘/dev/sda’

It suggests a problem with setting up partitions on /dev/sda, where we would put the boot partition. I knew it seemed familiar but I couldn’t recall the solution, and Google, while usually wonderful, got us to a Red Hat Support article behind a paywall. A few other results suggested the boot order was incorrect. The OS was thinking the drives were out of order. Being a Dell box, I checked the virtual drive order, which in my experience always has matched the boot order:

Screen Shot 2015-12-29 at 3.02.57 PM.png

After the anaconda failure, I went into another terminal to a prompt and checked /proc/partitions. Sure enough, we started at sdb, not sda. Then it hit me. There were 4 people viewing the console in iDRAC, so what if someone else had mounted a virtual disk and that was /dev/sda? Sure enough:

Screen Shot 2015 12 29 at 3.23.07 PM.png

Deleting the virtual media session, rebooting and starting the OS install again proved out everything worked fine.

The bonus humor here is that this isn’t the first time we’ve run into this. Hopefully after posting this, Google will index this page and point us to the answer a bit quicker next time.

Operationalizing TokuDB

2 Replies

In my previous post, I talked about implementing multi-threaded replication (MTR) using Percona Server 5.6. The server pairs that are utilizing MTR are also exclusively using the TokuDB storage engine.

I find TokuDB to be a fascinating engine. I can tell I will need to re-watch our Dbhangops session where Tim Callaghan talked about the differences between B-Tree and Fractal Tree indexes. There’s also a session on how compression works in TokuDB and they continue to innovate with read-free replication.

As with all new technology, there is a learning curve to understanding a new component or system. I thought it appropriate to try to document my experiences on operationalizing TokuDB into our environment. This is no where near comprehensive as I just don’t have enough experience with it yet to know the deeper intricacies of the engine.

Continue reading →

Multithreaded Replication to the Rescue

8 Replies

Recently, I set up several new database pairs for our backend search team to use. After bringing them online, the search team began backfilling data by writing to the A-side. A bit later, I noticed that replication had started falling behind, maxing out at ~1000 inserts/second.

Continue reading →

SSD All The Things

5 Replies

After some grueling IO testing on 7200rpm disks, I got my hands on some shiny new Samsung 840 SSDs and wanted to share the performance results in similar fashion.

Continue reading →

XFS and EXT4 Testing Redux

1 Reply

In my concluded testing post, I declared EXT4 my winner vs XFS for my scenario. My coworker, @keyurdg, was unwilling to let XFS lose out and made a few observations:

XFS wasn’t *really* being formatted optimally for the RAID stripe size
XFS wasn’t being mounted with the inode64 option which means that all of the inodes are kept in the first 2TB. (Side note: inode64 option is default in newer kernels but not on CentOS 6’s 2.6.32)
Single threaded testing isn’t entirely accurate because although replication is single threaded, the writes are collected in InnoDB and then writes it to disk using multiple threads governed by innodb_write_io_threads.

Armed with new data, I have – for real – the last round of testing.

Continue reading →

XFS and EXT4 Testing Concluded

2 Replies

I had a few more suggestions thrown out at me before I could wrap this one up.

Try disabling the RAID controller read-ahead
Try a few custom options to XFS
Try RAID-10

First, my final “best” state benchmarks for comparison:

FS	Raid	Size	Mount Options	Transfer/s	Requests/s	Avg/Request	95%/Request
xfs	6	4T	noatime,nodiratime,nobarrier	28.597Mb/sec	1830.24	0.51ms	2.06ms
ext4	6	4T	noatime,nodiratime,nobarrier	32.583Mb/sec	2085.33	0.46ms	1.89ms

Disabling the read-ahead was an interesting thought.

FS	RAID	Size	Mount Options	Transfer/s	Requests/s	Avg/Request	95%/Request
xfs	6	4T	noatime,nodiratime,nobarrier	28.704Mb/sec	1837.07	0.50ms	2.04ms
ext4	6	4T	noatime,nodiratime,nobarrier	32.715Mb/sec	2093.75	0.46ms	1.88ms

It didn’t seem to make any real difference however.

The second suggestion was to use modified XFS options (mkfs.xfs -f -d sunit=128,swidth=$((512*8)),agcount=32 /dev/sdb2).

FS	RAID	Size	Mount Options	Transfer/s	Requests/s	Avg/Request	95%/Request
xfs	6	4T	noatime,nodiratime,nobarrier	26.376Mb/sec	1688.07	0.55ms	2.18ms

It’s hard to tell, but it seems these actually degraded performance.

The last test was to switch to RAID-10. This would reduce overall storage capacity to 72TB, but given our requirements, this really shouldn’t cause any problem for the project. RAID-10 should have a significant boost to write performance.

FS	RAID	Size	Mount Options	Transfer/s	Requests/s	Avg/Request	95%/Request
xfs	10	36T	noatime,nodiratime,nobarrier	32.808Mb/sec	2099.72	0.46ms	1.80ms
ext4	10	36T	noatime,nodiratime,nobarrier	54.112Mb/sec	3463.17	0.28ms	1.11ms

These numbers back up the improvement to write speed, but XFS still lags behind at larger volume sizes.

Since I am had to reconfigure the array, I wanted to try the larger volume size (36T) above and then a smaller size (2T) to try to reproduce my earlier results showing XFS to perform better at lower volume size.

FS	RAID	Size	Mount Options	Transfer/s	Requests/s	Avg/Request	95%/Request
xfs	10	2.2T	noatime,nodiratime,nobarrier	60.066Mb/sec	3844.2	0.25ms	1.00ms
ext4	10	2.2T	noatime,nodiratime,nobarrier	64.766Mb/sec	4145.01	0.23ms	0.90ms

This was by far the best test results I had seen and has doubled the results from the original async test.

Testing conclusions

XFS seems to be very sensitive to partition size
In all but one case, EXT4 performed better on the random read-write tests
Know your other caveats of both file systems before picking the one for you

More EXT4 vs XFS IO Testing

2 Replies

Following my previous post, I got some excellent feedback in the forms of comments, tweets and other chat. In no particular order:

Commenter Tibi noted that ensuring I’m mounting with noatime, nodiratime and nobarrier should all improve performance.
Commenter benbradley pointed out a missing flag on some of my sysbench tests which will necessitate re-testing.
Former co-worker @preston4tw suggests looking at different IO schedulers. For all tests past, I used deadline which seems to be best, but re-testing with noop could be useful.
Fellow DBA @kormoc encouraged me to try many smaller partitions to limit the number of concurrent fsyncs.

There seem to be plenty of options here that should allow me to re-try my testing with a slightly more consistent method. The consistent difference seems to be in the file system, EXT4 vs XFS, with XFS performing at about half the speed of EXT4.

Continue reading →

IO, IO, It’s Off to Testing We Go

6 Replies

In my last post, I learned in disappointing fashion that sometimes you need to start small and work your way up, rather than trying to put together a finished product. This go-round, I’ll talk about my investigation into disk IO.

In an effort to better understand the hardware I have and it’s capacities, I started off by just trying to get some basic info about the RAID controller and the disks. This hardware in particular is a Supermicro, with a yet unknown RAID controller and 16 4TB disks arranged in RAID 6. Finding out more disk and controller information was the first step. “hdparm -i” wasn’t able to give me much, nor was “cat /sys/class/block/sdb/device/{model,vendor}”. “dmesg” got me to a list of hard disks, Hitatchi 7200rpm and a model number that I could Google. It also got me enough controller information to point to megaraid, which is LSI, which got me over to this MegaCli cheat sheet. Using “MegaCli -AdpAllInfo -aALL” actually got me a great deal of information. (In other news, I now think that Dell’s OMSA command line utility is a lot less terrible after trying to figure out MegaCli).

Continue reading →

	MySQL Performance Ch… on More EXT4 vs XFS IO Testi…
	jeremytinley on Operationalizing TokuDB
	john caresma on Operationalizing TokuDB
	goodandyou on KeyError: ‘/dev/sda…
	jeremytinley on Downsizing to SSDs

MySQL and Stuff

SELECT * FROM blog_posts;

Downsizing to SSDs

Source of Truth or Source of Madness?

KeyError: ‘/dev/sda’

Operationalizing TokuDB

Multithreaded Replication to the Rescue

SSD All The Things

XFS and EXT4 Testing Redux

XFS and EXT4 Testing Concluded

More EXT4 vs XFS IO Testing

IO, IO, It’s Off to Testing We Go