Skip to content

Latest commit

 

History

History
429 lines (402 loc) · 17.3 KB

可观测性设计.md

File metadata and controls

429 lines (402 loc) · 17.3 KB

Kubegems 可观测性设计

本文档描述当前kubegems实现的日志、监控、告警、事件、opentelemetry的可观测性方案。

监控&告警

架构

监控插件部署

监控插件 monitoring 部署了以下资源:

  • kube-prometheus-stack: 核心组件,通过prometheus-operator的方式部署、管理prometheus、alertmanager、kube-state-metrics,同时还部署有apiserver、kubelet等服务的servicemonitor
  • kubegems-default-monitor-amconfig:alertmanager全局配置,在kube-prometheus-stack中引用:
          alertmanager:
            alertmanagerConfiguration:
            name: kubegems-default-monitor-alert-rule
            templates:
            - configMap:
                key: kubegems-email.tmpl
                name: kubegems-email-template
  • kubegems-email-template: 邮件告警模板,除了kube-prometheus-stack中引用之外,我们配置alertmanagerconfig时,也需要在receiver中引用:
    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    ...
    spec:
      receivers:
      - emailConfigs:
        - headers:
          - key: subject
            value: Kubegems alert [{{ .CommonLabels.gems_alertname }}:{{ .Alerts.Firing
              | len }}] in [cluster:{{ .CommonLabels.cluster }}] [namespace:{{ .CommonLabels.gems_namespace
              }}]
          html: '{{ template "email.common.html" . }}'
          ...
  • kubegems-default-record-rule: 预置的recording rules
  • alertproxy: 告警代理组件,用以代理飞书、钉钉、阿里云短信语音告警,详见https://github.com/kubegems/alertproxy

功能使用

使用文档见https://www.kubegems.io/docs/category/%E7%9B%91%E6%8E%A7%E4%B8%AD%E5%BF%83

监控告警模板

我们提供两种监控模板:

  1. promql模板 用于便捷查询prometheus,系统预置的模板在:https://github.com/kubegems/kubegems/blob/main/config/promql_tpl.yaml,在数据库初始化时插入 模板分为三级结构:scoperesourcerule
  2. dashboard模板 用于便捷创建监控面板,系统预置的模板在:https://github.com/kubegems/kubegems/tree/main/config/dashboards,在数据库初始化时插入

另:可以使用脚本https://github.com/kubegems/kubegems/blob/main/scripts/export-monitor-tpls/main.go一键地将目标集群的模板数据导出,eg:

go run scripts/export-monitor-tpls/main.go --context mycluster
  1. system-alert 作为模板文件,用于在每个集群创建系统预置告警规则,由kubegems-worker创建、管理系统告警,目前是每天同步一次
func (s *AlertRuleSyncTasker) Crontasks() map[string]Task {
	return map[string]Task{
        // 同步告警规则状态
		"@every 5m": {
			Name:  "sync alertrule state",
			Group: "alertrule",
			Steps: []workflow.Step{{Function: TaskFunction_SyncAlertRuleState}},
		},
        // 检查告警规则配置
		"@every 12h": {
			Name:  "check alertrule config",
			Group: "alertrule",
			Steps: []workflow.Step{{Function: TaskFunction_CheckAlertRuleConfig}},
		},
        // 同步系统告警
		"@daily": {
			Name:  "sync system alertrule",
			Group: "alertrule",
			Steps: []workflow.Step{{Function: TaskFunction_SyncSystemAlertRule}},
		},
	}
}

监控采集

kubegems界面上的监控采集器管理,实际是操作servicemonitor资源,用户可以在这里自定义采集。 而接入中心的中间件接入,我们是把servicemonitor内置到该exporter的helm charts中了,详见https://github.com/kubegems/appstore-charts

告警

告警规则

监控告警由CRD:prometheusrulealertmanagerconfig实现,但是由于Kubegems是跨集群的,而且为了我们更便捷地查询、管理告警规则,我们在操作CRD之前,将配置的告警规则保存至mysql数据库,然后再做同步。 创建、更新、删除都按此逻辑,通过数据库事务保证数据一致性

// 创建
func (p *AlertRuleProcessor) CreateAlertRule(ctx context.Context, req *models.AlertRule) error {
	return p.DBWithCtx(ctx).Transaction(func(tx *gorm.DB) error {
		allRules := []models.AlertRule{}
		if err := tx.Find(&allRules, "cluster = ? and namespace = ? and name = ?", req.Cluster, req.Namespace, req.Name).Error; err != nil {
			return err
		}
		if len(allRules) > 0 {
			return errors.Errorf("alert rule %s is already exist", req.Name)
		}
		for _, rec := range req.Receivers {
			if rec.ID > 0 {
				return errors.Errorf("receiver's id should be null when create")
			}
		}
		if err := tx.Omit("Receivers.AlertChannel").Create(req).Error; err != nil {
			return err
		}
		return p.SyncAlertRule(ctx, req)
	})
}

// 更新
func (p *AlertRuleProcessor) UpdateAlertRule(ctx context.Context, req *models.AlertRule) error {
	return p.DBWithCtx(ctx).Transaction(func(tx *gorm.DB) error {
		if err := updateReceiversInDB(req, tx); err != nil {
			return errors.Wrap(err, "update receivers")
		}
		if err := tx.Select("expr", "for", "message", "inhibit_labels", "alert_levels", "promql_generator", "logql_generator").
			Updates(req).Error; err != nil {
			return err
		}
		return p.SyncAlertRule(ctx, req)
	})
}

// 删除
	if err := h.withAlertRuleProcessor(c.Request.Context(), c.Param("cluster"), func(ctx context.Context, p *AlertRuleProcessor) error {
		h.SetExtraAuditDataByClusterNamespace(c, req.Cluster, req.Namespace)
		action := i18n.Sprintf(ctx, "delete")
		module := i18n.Sprintf(ctx, "monitor alert rule")
		h.SetAuditData(c, action, module, req.Name)

		return p.DBWithCtx(ctx).Transaction(func(tx *gorm.DB) error {
			if err := tx.First(req, "cluster = ? and namespace = ? and name = ?", req.Cluster, req.Namespace, req.Name).Error; err != nil {
				return err
			}
			if err := tx.Delete(req).Error; err != nil {
				return err
			}
			return p.deleteMonitorAlertRule(ctx, req)
		})
	}); err != nil {
		handlers.NotOK(c, err)
		return
	}

禁用、启用、黑名单管理

另外,告警规则的禁用、启用、黑名单管理,都是通过配置管理alertmanager的silence实现:

  • 禁用告警:以namespacealertname为标签,为其创建一个silence规则
  • 启用告警:将该silence删除
  • 加入黑名单:以告警消息的所有标签创建一个silence
  • 移除黑名单:删除silence
// 禁用告警
func createSilenceIfNotExist(ctx context.Context, namespace, alertName string, cli agents.Client) error {
	silence, err := getSilence(ctx, namespace, alertName, cli)
	if err != nil {
		return err
	}
	// 不存在,创建
	if silence == nil {
		silence = &alerttypes.Silence{
			Comment:   fmt.Sprintf("silence for %s", alertName),
			CreatedBy: alertName,
			Matchers: labels.Matchers{
				&labels.Matcher{
					Type:  labels.MatchEqual,
					Name:  prometheus.AlertNamespaceLabel,
					Value: namespace,
				},
				&labels.Matcher{
					Type:  labels.MatchEqual,
					Name:  prometheus.AlertNameLabel,
					Value: alertName,
				},
			},
			StartsAt: time.Now(),
			EndsAt:   time.Now().AddDate(1000, 0, 0), // 100年
		}

		// create
		return cli.DoRequest(ctx, agents.Request{
			Method: http.MethodPost,
			Path:   "/custom/alertmanager/v1/silence/_/actions/create",
			Body:   silence,
		})
	}
	return nil
}

告警渠道

告警渠道是租户级的,我们管理他的时候,只操作mysql数据库,他在创建、更新告警规则时,才会被应用到对应的alertmanagerconfig的receiver中。 由于我们支持多种告警渠道,每种告警渠道要生成的receiver又有不同,所以我们通过channel接口实现:

type ChannelIf interface {
    // 转换为alertmanagerconfig的receiver对象
	ToReceiver(name string) v1alpha1.Receiver
    // 检查配置是否有误
	Check() error
    // 测试告警渠道是否能正常工作
	Test(alert prometheus.WebhookAlert) error
	String() string
}

如果我们要添加新的告警渠道,不仅要实现这个接口,还要在UnmarshalJSON方法添加对应的case

// UnmarshalJSON to deserialize []byte
func (m *ChannelConfig) UnmarshalJSON(b []byte) error {
	if string(b) == "null" {
		m.ChannelIf = nil
		return nil
	}
	tmp := struct {
		ChannelType `json:"channelType"`
	}{}
	if err := json.Unmarshal(b, &tmp); err != nil {
		return errors.Wrap(err, "unmarshal channelType")
	}
	switch tmp.ChannelType {
	case TypeWebhook:
		webhook := Webhook{}
		if err := json.Unmarshal(b, &webhook); err != nil {
			return errors.Wrap(err, "unmarshal webhook channel")
		}
		m.ChannelIf = &webhook
	case TypeEmail:
		email := Email{}
		if err := json.Unmarshal(b, &email); err != nil {
			return errors.Wrap(err, "unmarshal email channel")
		}
		m.ChannelIf = &email
	...
	default:
		return fmt.Errorf("unknown channel type: %s", tmp.ChannelType)
	}
	return nil
}

日志&告警

架构

日志插件部署

日志插件 logging 部署了以下资源:

  • logging-operator: banzaicloud的logging-operator,只部署了operator
  • logging-operator-logging: 基于logging-operator的日志组件,包括fluentbit、fluentd,还创建了默认的clusterOutputs,即kubegems-container-console-output
  • loki: loki+redis服务,注意这里配置了写入prometheus的remote write以及alertmanager告警地址
          ruler:
          storage:
            type: local
            local:
              directory: /rules
          rule_path: /tmp/rules
          alertmanager_url: {{ .Values.monitoring.alertmanager.address }}
          remote_write:
            enabled: true
            client:
              url: {{ .Values.monitoring.prometheus.address }}/api/v1/write
  • record-and-alert-rules: 预置的recording rule,它会定期查询loki,生成metri指标,再通过remote write的方式写入prometheus。除此之外,也可在其中配置日志告警规则
    apiVersion: v1
    kind: ConfigMap
    metadata:
      annotations:
        bundle.kubegems.io/ignore-options: OnUpdate
      name: kubegems-loki-rules
      namespace: {{ .Release.Namespace }}
    data:
      kubegems-loki-recording-rules.yaml: |-
        groups:
          - interval: 1m
            name: kubegems-loki-recording-rules
            rules:
            - expr: sum(count_over_time({stream=~"stdout|stderr"}[1m])) by (namespace, pod, container)
              record: gems_loki_logs_count_last_1m
            - expr: sum(count_over_time({stream=~"stdout|stderr"} |~ `error` [1m])) by (namespace, pod, container)
              record: gems_loki_error_logs_count_last_1m  

功能使用

https://www.kubegems.io/docs/category/%E6%97%A5%E5%BF%97%E4%B8%AD%E5%BF%83

日志采集

  • 精简模式,直接采集整个namespace所有容器的日志
    apiVersion: logging.banzaicloud.io/v1beta1
    kind: Flow
    metadata:
      name: default
      namespace: mynamespace
    spec:
      filters:
      - prometheus:
          labels:
            container: $.kubernetes.container_name
            flow: default
            namespace: $.kubernetes.namespace_name
            node: $.kubernetes.host
            pod: $.kubernetes.pod_name
          metrics:
          - desc: Total number of log entries collected by this each flow
            name: gems_logging_flow_records_total
            type: counter
      globalOutputRefs:
      - kubegems-container-console-output
  • 应用日志采集:采集指定应用日志
    apiVersion: logging.banzaicloud.io/v1beta1
    kind: Flow
    metadata:
      name: kubegems-eventer-flow
      namespace: kubegems-eventer
    spec:
      globalOutputRefs:
      - kubegems-container-console-output
      match:
      - select:
          labels:
            app.kubernetes.io/name: kubernetes-event-exporter

告警

loki的日志告警与prometheus的监控告警大同小异,区别为:

监控告警的规则在prometheusrule,日志告警的规则以configmap的形式挂载进的loki,也就是配管namespace kubegems-logging 中的configmap kubegems-loki-rules即可。

// 同步监控告警
func (p *AlertRuleProcessor) syncMonitorAlertRule(ctx context.Context, alertrule *models.AlertRule) error {
	if err := p.syncEmailSecret(ctx, alertrule); err != nil {
		return errors.Wrap(err, "sync secret failed")
	}
	if err := p.syncPrometheusRule(ctx, alertrule); err != nil {
		return errors.Wrap(err, "sync prometheusrule failed")
	}
	if err := p.syncAlertmanagerConfig(ctx, alertrule); err != nil {
		return errors.Wrap(err, "sync alertmanagerconfig failed")
	}
	return nil
}

// 同步日志告警
func (p *AlertRuleProcessor) syncLoggingAlertRule(ctx context.Context, alertrule *models.AlertRule) error {
	if err := p.syncEmailSecret(ctx, alertrule); err != nil {
		return errors.Wrap(err, "sync secret failed")
	}
	if err := p.syncLokiRules(ctx, alertrule); err != nil {
		return errors.Wrap(err, "sync loki rules failed")
	}
	if err := p.syncAlertmanagerConfig(ctx, alertrule); err != nil {
		return errors.Wrap(err, "sync alertmanagerconfig failed")
	}
	return nil
}

由上可以看出,日志与监控告警只有第二步不同,因此启用、禁用、黑名单、告警渠道管理也同监控告警,此处不再赘述。

事件

插件部署

https://github.com/kubegems/plugins/blob/main/plugins/eventer/templates/eventer.yaml 注意配置项:

  values:
    config:
      kubeQPS: {{ .Values.eventer.kubeQPS }}
      kubeBurst: {{ .Values.eventer.kubeBurst }}

如果我们采集集群事件时,遇到客户端限流导致事件采集终止的问题,需要适当调大上述配置

流程

  1. eventer list-watch event资源
  2. 将event信息打印到console日志中
  3. fluentbit采集->fluentd->loki

Opentelemetry

部署架构

注意:

  1. spanmetrics在以后的opentelemetry可能不再在processor中,而是迁移至connector
  2. prometheus使用主动scrape而不是remote-write,因此我们配置了servicemonitor
     serviceMonitor:
       enabled: true
       metricsEndpoints:
         - honorLabels: true
           port: metrics
         - honorLabels: true
           port: otlp-metrics

### 接入Opentelemetry
为了区分数据来源的`namespace`、`service_name`、`pod`、`node`,kubegems上的应用,需要添加以下环境变量:
```yaml
  - name: OTEL_K8S_NODE_NAME
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: spec.nodeName
  - name: OTEL_K8S_POD_NAME
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.name
  - name: OTEL_SERVICE_NAME
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.labels['app']
  - name: OTEL_K8S_NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: service.name=$(OTEL_SERVICE_NAME),namespace=$(OTEL_K8S_NAMESPACE),node=$(OTEL_K8S_NODE_NAME),pod=$(OTEL_K8S_POD_NAME)
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: http://opentelemetry-collector.observability:4318 # grpc change to 4317 port
  - name: OTEL_EXPORTER_OTLP_INSECURE
    value: "true"