As any server farm scales out, it becomes increasingly difficult to Watch All The Things™. I’ve been watching the progress of LogStash+ElasticSearch+Kibana (also known as an ELK stack) for a while and gave it a go this weekend. The trick for me was wanting to run each element inside of a separate Docker container so that I have easily portable elements to scale out with.
A step back. What is Docker? Docker is a container (using LXC) around an application. In short, you install Docker, start a container using a base image (CentOS, Ubuntu, etc.) and then run the container, dropping you into a shell. From here, you configure your application, then save your container. You can stop and start it at any time, relocate it to another server, or generally break it as badly as you want and you’ve done absolutely nothing to your host machine.
ElasticSearch is a data store and search tool for data. It will serve as the place for our logs. LogStash is a log parser. It understands what the source format is and has many output formats (including ElasticSearch). Kibana is a data visualization tool for searching your data store and drawing graphs to help see what’s going on.
Installing Docker was easy. My target is a CentOS 6 minimal VirtualBox and EPEL has docker-io RPM’s, so this is a simple matter of yum -y install docker-io. Starting the service (service docker start) gets things up and going.
Next step was to download a base image to use. I’m a fan of CentOS, so I use “docker pull centos” as root to fetch the base image. I can confirm it’s there by running “docker images”. Last thing to do is get that base image running.
There is a great tutorial to getting all this stuff together, which I loosely followed but had to adapt to running things within Docker.
The docker run syntax is a bit more complicated than the rest, but after some trial and error, I settled on “docker run -t -i -name elasticsearch -p 9200:9200 centos /bin/bash” to establish my container for elasticsearch.
From my docker container bash prompt, I was able to install the RPMs java-1.7.0-openjdk and elasticsearch RPMs, then start elasticsearch (service elasticsearch start). I get a warning about being unable to change the ulimit on open files. I’ve ignored that one for now and put it on my list of things to look at later. I’m chalking it up to docker limiting exactly what I can change from within my container (EDIT: Looks like I’m not the only one).
Last thing I did was make sure ElasticSearch was really running by using the curl command in the tutorial: “curl ‘http://localhost:9200/_search?pretty'” to verify things were good to go.
Normally, installing and running LogStash is just as easy as ElasticSearch, but moving it to a Docker container proved a bit of a challenge. The first thing I ran into was that Docker containers, by default, cannot talk to each other. To solve that, you have to link them. I kicked off a “docker run -t -i -name logstash -link elasticsearch:es -p 9000:9000 centos /bin/bash” and that got me to the bash prompt.
I installed the java-1.7.0-openjdk and logstash RPMs and used the cookbook guide to set up a syslog collector listening on port 9000 (note: this was my arbitrary port selection). It’s pointing the ElasticSearch destination to my other container which you can get with an environment variable (use env to find what’s populated by using -link in Docker). With that process running, I used another dev VM and reconfigured rsyslog to point its data to my new collector. Upon bouncing rsyslog, the stdout on my logstash process showed instant results of data being accumulated.
Re-running the curl for my ElasticSearch container got me data structures that showed my logging events. In a production scenario, I would probably update Puppet to make this configuration across an entire farm.
The last big piece of this puzzle is to get the Kibana 3 dashboard up and going. A new container (docker run -t -i -p 80:80 -name kibana3 centos /bin/bash) gets me my bash prompt. After that, I installed httpd and then extracted the tgz into the /var/www/html directory.
After starting httpd, I was able to hit my Kibana3 container from a browser. Kibana actually reaches from your browser out to the ElasticSearch container directly.
Kibana is going to take a while to absorb, but I was able to figure out some basics such as picking data to view, using filters and setting up a basic graph.
Where To Go From Here?
My next steps are going to be rewriting what I have done by hand into a handful of Dockerfile examples. These are files that define what a docker container is and has. One of the caveats to using docker is that you only run one command, and in my case, that’s typically /bin/bash, I have to actually start the services. A Dockerfile will let me configure what is running without having to constantly start the service if I restart the container.
After that, I need to dig into how ElasticSearch stores its data. Docker isn’t a persistent data store without calling a commit on it. Docker does provide a method for exporting data from a container to its host server, which would make it persistent and easily consumed by a backup process. Unfortunately, that then ties the host with the container, making it much more difficult to move the container around. There is a different pattern in the docker community that establishes a link between a service and its data, loosely called a data-only container. I think it will still take some digging to find what’s best in this scenario.
The last thing I can foresee is wanting to limit access ElasticSearch access by HTTP realm or IP restrictions.