We'll go over these in more detail below.
- Cassandra. We run and recommend 3.8 or newer See Cassandra
- The latest (1.0.1 or newer) version of Graphite
- Optional: statsd or something compatible with it. For instrumentation of graphite.
- Optional: Kafka, if you want to buffer data in case metrictank goes down. Kafka 2.0.0 is highly recommended. more info
Note: Cassandra and Kafka require Java, which will be automatically installed by apt as a dependency when we install Cassandra.
metrictank ingest metrics data. The data can be sent into it, or be read from a queue (see Inputs). Metrictank will compress the data into chunks in RAM, a configurable number of the most recent data is kept in RAM, but the chunks are being saved to Cassandra as well. You can use a single Cassandra instance or a cluster. Metrictank will also respond to queries: if the data is recent, it'll come out of RAM, and older data is fetched from cassandra. This happens transparantly. Metrictank maintains an index of metrics metadata, for all series it sees. You can use an index entirely in memory, or backed by Cassandra for persistence. You can query metrictank directly (it has fast, but limited built-in processing and will fallback to graphite when needed) or you can also just query graphite which will always use graphite's processing but use metrictank as a datastore.
We recommend a server with at least 8GB RAM and a few CPU's. You need root access. All the commands shown assume you're root.
Grafana Labs provides 2 repositories:
- raintank: stable repository for official stable releases
- testing: testing repository that has the latest packages which typically bring improvements but possibly also new bugs. These packages are built from git master and are named
<base-release>-<commits-since-release>-<git-hash>
See the installation instructions on those pages for how to enable the repositories for your distribution
Supported distributions:
- Ubuntu 14.04 (Trusty Tahr), 16.04 (Xenial Xerus)
- Debian 7 (wheezy), 8 (jessie)
- Centos 6, 7
You need to install these packages:
- metrictank
You can enable our repository and install the metrictank package like so: (Feel free to use the testing repository instead)
curl -s https://packagecloud.io/install/repositories/raintank/raintank/script.deb.sh | bash
apt-get install metrictank
Install Graphite via your preferred method as detailed at http://graphite.readthedocs.io/en/latest/install.html (We hope to provide Debian and Ubuntu packages in the near future.)
Configure graphite with the following settings in local_settings.py
CLUSTER_SERVERS = ['localhost:6060']
REMOTE_EXCLUDE_LOCAL = False
USE_WORKER_POOL = True
POOL_WORKERS_PER_BACKEND = 8
POOL_WORKERS = 1
REMOTE_FIND_TIMEOUT = 30.0
REMOTE_FETCH_TIMEOUT = 60.0
REMOTE_RETRY_DELAY = 60.0
MAX_FETCH_RETRIES = 2
FIND_CACHE_DURATION = 300
REMOTE_STORE_USE_POST = True
REMOTE_STORE_FORWARD_HEADERS = ["x-org-id"]
REMOTE_PREFETCH_DATA = True
STORAGE_FINDERS = ()
Add the cassandra repository:
cat << EOF >> /etc/apt/sources.list
deb http://www.apache.org/dist/cassandra/debian 30x main
deb-src http://www.apache.org/dist/cassandra/debian 30x main
EOF
-
Run
gpg --keyserver pgp.mit.edu --recv-keys 0353B12C && gpg --export --armor 0353B12C | apt-key add -
to add the GPG key. -
Run
apt-get update && apt-get install cassandra cassandra-tools
For basic setups, you can just start it with default settings. To tweak schema and settings, see Cassandra
- Start cassandra:
/etc/init.d/cassandra start
The log - should you need it - is at /var/log/cassandra/system.log
You can optionally statsd or a statsd-compatible agent for instrumentation of graphite and optionally any of your other applications.
You can install the official statsd (see its installation instructions) or an alternative. We recommend raintank/statsdaemon.
Below are instructions for statsd and statsdaemon.
Note:
<environment>
is however you choose to call your environment. (test, production, dev, ...).- Note, statsd/statsdaemon will write to metrictank's carbon port on localhost:2003.
Statsdaemon is the recommended option. Install the package from the raintank repository you enabled earlier:
apt-get install statsdaemon
Update the following settings in /etc/statsdaemon.ini
:
flush_interval = 1
prefix_rates = "stats.<environment>."
prefix_timers = "stats.<environment>.timers."
prefix_gauges = "stats.<environment>.gauges."
percentile_thresholds = "90,75"
Run it:
systemctl start statsdaemon
Or:
service statsdaemon start
The logs, should you need them:
journalctl -u statsdaemon
If you want to use the origital statsd server instead of statsdaemon, see the instructions on the statsd homepage Set the following options:
flushInterval: 1000
globalPrefix: "stats.<environment>"
You can run a persistent queue in front of metrictank. If your metric instance(s) go down, then a queue is helpful in buffering and saving all the data while your instance(s) is/are down. The moment your metrictank instance(s) come(s) back up, they can replay everything they missed (and more, it's useful to load in older data so that you can serve queries for it out of RAM). Also, in case you want to make any change to your aggregations, Cassandra cluster, or whatever, it can be useful to re-process older data.
Kafka requires Zookeeper, so set that up first.
-
Download zookeeper. Find a mirror at http://www.apache.org/dyn/closer.cgi/zookeeper/, pick a stable zookeeper, and download it to your server.
-
Unpack zookeeper. For this guide we'll install it in
/opt
.
cd /opt
tar -zxvf /root/zookeeper-3.4.9.tar.gz # update path if you downloaded elsewhere.
ln -s /opt/zookeeper-3.4.9 /opt/zookeeper
mkdir /var/lib/zookeeper
- Make a config file for zookeeper:
cat << EOF > /opt/zookeeper/conf/zoo.cfg
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
EOF
- Start zookeeper:
/opt/zookeeper/bin/zkServer.sh start
Kafka 2.0.0 is highly recommended, though older versions work too. more info
-
Download kafka. Find a mirror at https://archive.apache.org/dist/kafka/2.0.0/kafka_2.12-2.0.0.tgz, and download kafka to your server.
-
Unpack kafka. Like zookeeper, we'll do so in
/opt
.
cd /opt
tar -zxvf /root/kafka_2.12-2.0.0.tgz # update path if you downloaded elsewhere
ln -s /opt/kafka_2.12-2.0.0 /opt/kafka
- Start kafka:
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
The log - if you need it - lives at /opt/kafka/logs/server.log
Now edit the file at /etc/metrictank/metrictank.ini
. It should be commented enough to guide you through the various options.
You may have to adjust statsd-addr
, cassandra-addrs
, cassandra-idx
's hosts
option and kafka-mdm-in
's brokers
option if you run
any of these services on different locations then the localhost defaults.
Out of the box, one input is enabled: the Carbon line input
It uses a default storage-schemas to coalesce every incoming metric into 1 second resolution. You may want to fine tune this for your needs
at /etc/metrictank/storage-schemas.conf
. (or simply what you already use in a pre-existing Graphite install).
See the input plugin documentation referenced above for more details.
If you want to use Kafka, you should enable the Kafka-mdm input plugin.
See the kafka-mdm-in
section, set enabled
to true.
See the Inputs docs for more details.
Finally, by default memory-idx
enabled
is true, while cassandra-idx
has enabled
as false.
This will use the non-persistent index, starting with a fresh index at every start of metrictank.
You probably want to disable the memory index an enable cassandra-idx
instead. (just switch the enabled values around).
See metadata for more details.
If using upstart:
service metrictank start
If using systemd:
systemctl start metrictank
Note that metrictank simply logs to stdout. So where the log data ends up depends on your init system.
If using upstart, you can then find the logs at /var/log/upstart/metrictank.log
.
With systemd, you can use something like journalctl -f -u metrictank
.
In Grafana, you can now add a graphite datasource with url http://<ip>:8080
.
If you access Grafana over https, make sure to use proxy mode, otherwise browsers will refuse to load content from the http datasource.
You can start visualizing the data that's already in there by importing
- Metrictank dashboard: visualizes all metrictank's internal performance metrics, which it sends via statsd/statsdaemon, into itself. This dashboard will not work if you disabled statsd.
- Statsdaemon dashboard: if you use statsdaemon, you can visualize its performance metrics, stored in metrictank.
You're probably interested in loading in some fake data as well, perhaps to benchmark metrictank. A full benchmarking guide is out of scope for this installation guide, but here are some suggestions:
- Use the haggar tool, which simulates independent clients, gradually appearing and sending data at randomized intervals into metrictank's carbon input port. Invoke like so:
./haggar -agents 10 -jitter 1ms
- Use fakemetrics. which has a few modes of operation. But one of the useful features is that it can send metrics in metrics2.0 format, into kafka. You can do so using:
./fakemetrics -kafka-mdm-tcp-address localhost:9092 -orgs 100 -keys-per-org 1000