Running ElasticSearch, LogStash and Kibana in Docker

As any server farm scales out, it becomes increasingly difficult to Watch All The Things™. I’ve been watching the progress of LogStash+ElasticSearch+Kibana (also known as an ELK stack) for a while and gave it a go this weekend. The trick for me was wanting to run each element inside of a separate Docker container so that I have easily portable elements to scale out with.

A step back. What is Docker? Docker is a container (using LXC) around an application. In short, you install Docker, start a container using a base image (CentOS, Ubuntu, etc.) and then run the container, dropping you into a shell. From here, you configure your application, then save your container. You can stop and start it at any time, relocate it to another server, or generally break it as badly as you want and you’ve done absolutely nothing to your host machine.

ElasticSearch is a data store and search tool for data. It will serve as the place for our logs. LogStash is a log parser. It understands what the source format is and has many output formats (including ElasticSearch). Kibana is a data visualization tool for searching your data store and drawing graphs to help see what’s going on.

Docker

Installing Docker was easy. My target is a CentOS 6 minimal VirtualBox and EPEL has docker-io RPM’s, so this is a simple matter of yum -y install docker-io. Starting the service (service docker start) gets things up and going.

Next step was to download a base image to use. I’m a fan of CentOS, so I use “docker pull centos” as root to fetch the base image. I can confirm it’s there by running “docker images”. Last thing to do is get that base image running.

ElasticSearch

There is a great tutorial to getting all this stuff together, which I loosely followed but had to adapt to running things within Docker.

The docker run syntax is a bit more complicated than the rest, but after some trial and error, I settled on “docker run -t -i -name elasticsearch -p 9200:9200 centos /bin/bash” to establish my container for elasticsearch.

From my docker container bash prompt, I was able to install the RPMs java-1.7.0-openjdk and elasticsearch RPMs, then start elasticsearch (service elasticsearch start). I get a warning about being unable to change the ulimit on open files. I’ve ignored that one for now and put it on my list of things to look at later. I’m chalking it up to docker limiting exactly what I can change from within my container (EDIT: Looks like I’m not the only one).

ElasticSearch provides both RPMs and a repository for installing. For my testing purposes, I just used the direct RPM but adding the repository would be a solid idea to ensure you can stay updated.

Last thing I did was make sure ElasticSearch was really running by using the curl command in the tutorial: “curl ‘http://localhost:9200/_search?pretty'” to verify things were good to go.

LogStash

Normally, installing and running LogStash is just as easy as ElasticSearch, but moving it to a Docker container proved a bit of a challenge. The first thing I ran into was that Docker containers, by default, cannot talk to each other. To solve that, you have to link them. I kicked off a “docker run -t -i -name logstash -link elasticsearch:es -p 9000:9000 centos /bin/bash” and that got me to the bash prompt.

I installed the java-1.7.0-openjdk and logstash RPMs and used the cookbook guide to set up a syslog collector listening on port 9000 (note: this was my arbitrary port selection). It’s pointing the ElasticSearch destination to my other container which you can get with an environment variable (use env to find what’s populated by using -link in Docker). With that process running, I used another dev VM and reconfigured rsyslog to point its data to my new collector. Upon bouncing rsyslog, the stdout on my logstash process showed instant results of data being accumulated.

Re-running the curl for my ElasticSearch container got me data structures that showed my logging events. In a production scenario, I would probably update Puppet to make this configuration across an entire farm.

Kibana

The last big piece of this puzzle is to get the Kibana 3 dashboard up and going. A new container (docker run -t -i -p 80:80 -name kibana3 centos /bin/bash) gets me my bash prompt. After that, I installed httpd and then extracted the tgz into the /var/www/html directory.

After starting httpd, I was able to hit my Kibana3 container from a browser. Kibana actually reaches from your browser out to the ElasticSearch container directly.

Kibana is going to take a while to absorb, but I was able to figure out some basics such as picking data to view, using filters and setting up a basic graph.

Where To Go From Here?

My next steps are going to be rewriting what I have done by hand into a handful of Dockerfile examples. These are files that define what a docker container is and has. One of the caveats to using docker is that you only run one command, and in my case, that’s typically /bin/bash, I have to actually start the services. A Dockerfile will let me configure what is running without having to constantly start the service if I restart the container.

After that, I need to dig into how ElasticSearch stores its data. Docker isn’t a persistent data store without calling a commit on it. Docker does provide a method for exporting data from a container to its host server, which would make it persistent and easily consumed by a backup process. Unfortunately, that then ties the host with the container, making it much more difficult to move the container around. There is a different pattern in the docker community that establishes a link between a service and its data, loosely called a data-only container. I think it will still take some digging to find what’s best in this scenario.

The last thing I can foresee is wanting to limit access ElasticSearch access by HTTP realm or IP restrictions.

Advertisements

6 thoughts on “Running ElasticSearch, LogStash and Kibana in Docker

  1. Michael Ferranti

    Hi Jeremy, Thanks for this tutorial! I found it researching how people are running data-backed services like ElasticSearch in Docker containers. The thing you mentioned about making data persistant is a pain, even with data-only containers, because your volume is going to be suck on a single host machine and if you need to move ES to a bigger node, or something you have to do all that manually. We’re working on an OSS project to solve this issue, and I’d love your feedback. Basically, we’re working on tools to let you define your app as a set of connected containers, deploy them to multiple nodes, and migrate them, and their data volumes between hosts with minimal to no-downtime. Basically, we’re automating all the things that would have to be done manually to reliably run an app with data inside a container.

    Would love your feedback: https://github.com/ClusterHQ/flocker

    The code is 100% open source and we’re looking for feedback from the community to make it better.

    Cheers!

    Michael
    @ferrrantim

    Reply
    1. ferrantim

      Jeremy,

      Just wanted to follow up and let you know that we just released the first major point release of Flocker.

      Flocker handles multi-node deployment, container migration along with associated volumes, and container networking so you don’t have to update DNS after a migration.

      Would love you thoughts if you have a minute.

      The code is here (https://github.com/ClusterHQ/flocker), and we put together a tutorial for deploying and migrating MongoDB (http://docs.clusterhq.com/en/0.1.0/gettingstarted/tutorial/index.html) and are working on other. Hope to have an ES example soonish.

      Cheers,

      Michael

      Reply
    1. Michael

      Hey Patrick, sorry for comment hijacking, I hope Jeremy doesn’t mind. I think this article on Deploying and maintaining an ELK cluster using Docker would be useful. It includes the Dockerfiles for each container (ElasticSearch, Logstash and Kibana). It doesn’t run on CoreOS, but CoreOs is on of the Linux distros we’re working on supporting. Cheers, https://clusterhq.com/blog/deploying-multi-node-elasticsearch-logstash-kibana-cluster-using-docker/

      Reply
  2. Pingback: Docker: Running logstash in a Docker Container – kd

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s