Simple service discovery in Prometheus via Consul

Pareto law (Pareto principle, 80/20 principle) - “20% of the efforts yield 80% of the result, and the remaining 80% of the efforts give only 20% of the result."
Wikipedia

Greetings, dear reader!


My first article on Habr is devoted to a simple and, I hope, useful solution that made collecting metrics in Prometheus from heterogeneous servers convenient for me. I will touch on some details that many might not have dived into when using the Prometheus, and share my approach to organizing a lightweight service discovery in it.


For this you will need: Prometheus, HashiCorp Consul, systemd, some Bash code and awareness of what is happening.


If you are interested to know how all this is connected and how it works, welcome to cat.


Prometheus + Bash + Consul


Meet: Prometheus


Prometheus , Kubernetes. , Prometheus pull-, , , . Prometheus Kubernetes prometheus.yml kubernetes_sd_configs. kube-apiserver IP- pod' .


scrape_configs:
- job_name: kubernetes-pods
  kubernetes_sd_configs:
  - role: pod
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_ip, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: (.+);(.+)
    replacement: $1:$2
    target_label: __address__
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    regex: (.+)
    target_label: __metrics_path__
  - action: labelmap
    regex: __meta_(kubernetes_namespace|kubernetes_pod_name|kubernetes_pod_label_manifest_sha1|kubernetes_pod_node_name)

, . role=pod kubernetes_sd_configs.


, Prometheus Kubernetes, , DaemonSet prometheus/node_exporter, Kubernetes: node_exporter, Zookeeper, Kafka, ClickHouse, CEPH, Elasticsearch, Tarantool ...


targets static_configs Kafka OSD CEPH . . , , Ansible, , CEPH OSD . prometheus.yml. , .


, . prometheus.yml, , job_name . prometheus.yml Kafka 6 . 3. 3 ClickHouse, 4 . CEPH . , — prometheus/node_exporter . prometheus.yml static_configs.


Don’t specify default values unnecessarily: simple, minimal configuration will make errors less likely.

Kubernetes Documentation, Configuration Best Practices, General Configuration Tips.


* - - , . !


, . , , , .default .original. , , 2-3 - : , GitHub . , .*


HashiCorp Consul


prometheus.yml. Prometheus — consul_sd_configs. , Prometheus, HashiCorp Consul, . :


scrape_configs:
- job_name: SERVICE_NAME
  consul_sd_configs:
  - server: consul.example.com
    scheme: https
    tags: [test] # dev|test|stage|prod|...
    services: [prometheus-SERVICE_NAME-exporter]

Consul, , , agent. HTTP API Consul, . , : , . Consul. , Consul : KV-, HashiCorp Vault, Traefik. , , DNS. , Consul agent. Consul, HTTP API Prometheus , . , HTTPS, .


, Kubernetes StatefulSet Consul, Traefik, service discovery Prometheus. Ingress , Consul’s web UI. Traefik HTTPS- Let`s Encrypt DNS Challenge.


consul.yml
# https://consul.io/docs/agent/options.html

---
apiVersion: v1
kind: Service
metadata:
  name: consul
  labels:
    app: consul
spec:
  selector:
    app: consul
  ports:
  - name: http
    port: 8500

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: consul
  labels:
    app: consul
spec:
  serviceName: consul
  selector:
    matchLabels:
      app: consul
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      storageClassName: cephfs
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
  template:
    metadata:
      labels:
        app: consul
    spec:
      automountServiceAccountToken: false
      terminationGracePeriodSeconds: 60
      containers:
      - name: consul
        image: consul:1.6
        volumeMounts:
        - name: data
          mountPath: /consul/data
        args:
        - agent
        - -server
        - -client=0.0.0.0
        - -bind=127.0.0.1
        - -bootstrap
        - -bootstrap-expect=1
        - -disable-host-node-id
        - -dns-port=0
        - -ui
        ports:
        - name: http
          containerPort: 8500
        readinessProbe:
          initialDelaySeconds: 10
          httpGet:
            port: http
            path: /v1/agent/members
        livenessProbe:
          initialDelaySeconds: 30
          httpGet:
            port: http
            path: /v1/agent/members
        resources:
          requests:
            cpu: 0.2
            memory: 256Mi

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: consul
  labels:
    app: consul
  annotations:
    traefik.ingress.kubernetes.io/frontend-entry-points: http,https
    traefik.ingress.kubernetes.io/redirect-entry-point: https
spec:
  rules:
  - host: consul.example.com
    http:
      paths:
      - backend:
          serviceName: consul
          servicePort: http

Prometheus, Consul , Consul, . . Bash.


Bash systemd.service


CoreOS Container Linux. DevOpsConf Russia 2018. Docker , systemd.service. Flatcar Linux, , CoreOS Container Linux. CoreOS Flatcar !


systemd.service. systemd — , Linux. systemd.service, [Service], ExecStartPost, ExecStop. - Consul.


prometheus-node-exporter.service. , static_configs 100 .


[Unit]
After=docker.service
[Service]
Environment=CONSUL_URL=https://consul.example.com
ExecStartPre=-/usr/bin/docker rm --force %N
ExecStart=/usr/bin/docker run \
    --name=%N \
    --rm=true \
    --network=host \
    --pid=host \
    --volume=/:/rootfs:ro \
    --label=logger=json \
    --stop-timeout=30 \
    prom/node-exporter:v0.18.1 \
    --log.format=logger:stdout?json=true \
    --log.level=error
ExecStartPost=/opt/bin/consul-service register -e prod -n %N -p 9100 -t prometheus,node-exporter
ExecStop=/opt/bin/consul-service deregister -e prod -n %N
ExecStop=-/usr/bin/docker stop %N
Restart=always
StartLimitInterval=0
RestartSec=10
KillMode=process
[Install]
WantedBy=multi-user.target

/opt/bin/consul-service. , .


:


  • CONSUL_URL — Consul .
  • CONSUL_TOKEN — HTTP- «X-Consul-Token», HTTP API Consul. Consul web UI, ACLs.

:


  • register/deregister — . .
  • -e — — [dev|test|prod|…].
  • -n — , prometheus-node-exporter. %N, systemd.unit .
  • -p — Prometheus . Consul.
  • -t — . Consul. Consul.

consul-service systemd.service . systemd.service , ExecStartPost consul-service register, Consul, Prometheus , . , . ExecStop, consul-service deregister, . Consul Prometheus , . , , Service Availability . , , , .


consul-service hostname search ( ) resolv.conf. DNS-, best practice .


, , !

IP, - . . .


consul-service Bash ~60 . cURL Bash. , . - MIT, , .


scrape_configs prometheus.yml prometheus/node_exporter, , .


scrape_configs:
- job_name: node-exporter
  consul_sd_configs:
  - server: consul.example.com
    scheme: https
    tags: [prod]
    services: [prometheus-node-exporter]

tags: [prod] production . -e consul-service, Consul.


Flatcar Linux /opt/bin Ansible. Playbook tasks. , crontab.


tasks:
- name: Create directory "/opt/bin"
  with_items: [/opt/bin]
  file:
    state: directory
    path: "{{ item }}"
    owner: root
    group: root
    mode: 0755
- name: Download "consul-service.sh" to "/opt/bin/consul-service"
  get_url:
    url: https://raw.githubusercontent.com/devinotelecom/consul-service/master/consul-service.sh
    dest: /opt/bin/consul-service
    owner: root
    group: root
    mode: 0755
    force: yes

Ansible, Python, Flatcar Linux , . Flatcar Linux.



, . , «+++», . .
— “ ! !”.


All Articles