Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimir-distributed helm chart not following Restricted Pod Security Standard as claimed by Grafana docs #5758

Closed
dorkamotorka opened this issue Aug 16, 2023 · 3 comments
Labels

Comments

@dorkamotorka
Copy link

Describe the bug

As per Grafana Mimir documentation it should be possible to install mimir-distributed helm-chart by following the Kubernetes Restricted security policy. But when I'm deploying on the GKE Autopilot, this does not hold true. All component like ruler, compactor, ingester, Alertmanager, store gateway require this helm configuration:

  containerSecurityContext:
    readOnlyRootFilesystem: false
    runAsNonRoot: false
    runAsUser: 0

in order to avoid errors like read-only filesystem, permission denied while accessing X directory. The configuration above is obviously against the best security practices in Kubernetes. I think anybody should be able to reproduce this scenario by just deploying the mimir-distributed helm chart onto the GKE Autopilot. Note, that I'm using GCS storage buckets for the components where the configuration allows it but still as per my investigation there are some temporary file the Mimir is trying to save onto a filesystem.

Output of helm version:

version.BuildInfo{Version:"v3.12.0", GitCommit:"c9f554d75773799f72ceef38c51210f1842a1dea", GitTreeState:"clean", GoVersion:"go1.20.4"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.16", GitCommit:"51e33fadff13065ae5518db94e84598293965939", GitTreeState:"clean", BuildDate:"2023-07-19T12:26:21Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.14-gke.2700", GitCommit:"20f1946282011a3f0cec885eaafe3decc9c367c9", GitTreeState:"clean", BuildDate:"2023-06-22T09:23:35Z", GoVersion:"go1.19.9 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

To Reproduce

Deploy Mimir-distributed helm chart onto a GKE Autopilot.

Expected behavior

I expect to deploy mimir-distributed helm chart and comply with the latest security recommendations.

Environment

  • Infrastructure: GKE
  • Deployment tool: Helm
@dimitarvdimitrov
Copy link
Contributor

Can you share your values file? I'm a bit surprised this is the case because none of the components should be configured to write to the root file system with the default values.yaml

@dorkamotorka
Copy link
Author

Hey @dimitarvdimitrov, here you go:

mimir:
  config: |
    usage_stats:
      installation_mode: helm

    activity_tracker:
      filepath: /active-query-tracker/activity.log

    server:
      log_format: "logfmt"
      log_level: "debug"
      grpc_server_max_concurrent_streams: 1000
      grpc_server_max_connection_age: 2m
      grpc_server_max_connection_age_grace: 5m
      grpc_server_max_connection_idle: 1m

    common:
      storage:
        backend: gcs

    multitenancy_enabled: true

    # Check https://grafana.com/docs/mimir/latest/references/configuration-parameters/#frontend when modifying
    frontend:
      # NOTE: This was modified from the *-headless service (Is there a downside?)
      {{- if .Values.query_scheduler.enabled }}
      scheduler_address: {{ template "mimir.fullname" . }}-query-scheduler.{{ .Release.Namespace }}.svc:{{ include "mimir.serverGrpcListenPort" . }}
      {{- end }}
      # Downstream URL of Mimir Querier, because some API calls just directly go to the downstream Querier
      downstream_url: http://{{ template "mimir.fullname" . }}-querier.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
      address: {{ template "mimir.fullname" . }}-query-frontend.{{ .Release.Namespace }}.svc
      port: {{ include "mimir.serverGrpcListenPort" . }}

    # Check https://grafana.com/docs/mimir/latest/references/configuration-parameters/#frontend_worker when modifying
    frontend_worker:
      # NOTE: This was modified from the *-headless service (Is there a downside?)
      {{- if .Values.query_scheduler.enabled }}
      scheduler_address: {{ template "mimir.fullname" . }}-query-scheduler.{{ .Release.Namespace }}.svc:{{ include "mimir.serverGrpcListenPort" . }}
      # NOTE: This was modified from the *-headless service (Is there a downside?)
      {{- else }}
      frontend_address: {{ template "mimir.fullname" . }}-query-frontend.{{ .Release.Namespace }}.svc:{{ include "mimir.serverGrpcListenPort" . }}
      {{- end }}

    blocks_storage:
      gcs:
        bucket_name: {{ .Values.blocks_bucket_name }}

    alertmanager_storage:
      gcs:
        bucket_name: {{ .Values.alert_bucket_name }}

    ruler_storage:
      gcs:
        bucket_name: {{ .Values.ruler_bucket_name }}

    ingester:
      ring:
        final_sleep: 0s
        num_tokens: 512
        tokens_file_path: /data/tokens
        unregister_on_shutdown: false
      
    ingester_client:
      grpc_client_config:
        max_recv_msg_size: 104857600
        max_send_msg_size: 104857600

    limits:
      # Limit queries to 500 days. You can override this on a per-tenant basis.
      max_total_query_length: 12000h
      # Adjust max query parallelism to 16x sharding, without sharding we can run 15d queries fully in parallel.
      # With sharding we can further shard each day another 16 times. 15 days * 16 shards = 240 subqueries.
      max_query_parallelism: 240
      # Avoid caching results newer than 10m because some samples can be delayed
      # This presents caching incomplete results
      max_cache_freshness: 10m

    memberlist:
      abort_if_cluster_join_fails: false
      compression_enabled: false
      join_members:
      - dns+{{ include "mimir.fullname" . }}-gossip-ring.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}:{{ include "mimir.memberlistBindPort" . }}
  
    querier:
      # With query sharding we run more but smaller queries. We must strike a balance
      # which allows us to process more sharded queries in parallel when requested, but not overload
      # queriers during non-sharded queries.
      max_concurrent: 16

    query_scheduler:
      # Increase from default of 100 to account for queries created by query sharding
      max_outstanding_requests_per_tenant: 800

    alertmanager:
      data_dir: /data
      enable_api: true
      external_url: /alertmanager
      {{- if .Values.alertmanager.fallbackConfig }}
      fallback_config_file: /configs/alertmanager_fallback_config.yaml
      {{- end }}

    ruler:
      # NOTE: This was modified from the *-headless service (Is there a downside?)
      alertmanager_url: dnssrvnoa+http://_http-metrics._tcp.{{ template "mimir.fullname" . }}-alertmanager.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}/alertmanager
      enable_api: true
      query_frontend:
        address: {{ template "mimir.fullname" . }}-query-frontend.{{ .Release.Namespace }}.svc:{{ include "mimir.serverGrpcListenPort" . }}

    runtime_config:
      file: /var/{{ include "mimir.name" . }}/runtime.yaml

compactor:
  containerSecurityContext:
    readOnlyRootFilesystem: false
    runAsNonRoot: false
    runAsUser: 0
ingester:
  zoneAwareReplication:
    enabled: false
  containerSecurityContext:
    readOnlyRootFilesystem: false
    runAsNonRoot: false
    runAsUser: 0
alertmanager:
  containerSecurityContext:
    readOnlyRootFilesystem: false
    runAsNonRoot: false
    runAsUser: 0
ruler:
  containerSecurityContext:
    readOnlyRootFilesystem: false
    runAsNonRoot: false
    runAsUser: 0
store_gateway:
  zoneAwareReplication:
    enabled: false
  containerSecurityContext:
    readOnlyRootFilesystem: false
    runAsNonRoot: false
    runAsUser: 0

minio:
  enabled: false

query_frontend:
  replicas: 1

query_scheduler:
  enabled: true
  replicas: 1

querier:
  replicas: 3

distributor:
  replicas: 1

overrides_exporter:
  enabled: false

nginx:
  enabled: false

gateway:
  enabledNonEnterprise: true
  ingress:
    enabled: false

@dimitarvdimitrov
Copy link
Contributor

the need to enable root filesystem access is due to the default values of Mimir. Those use the current directory for storing files. These default values are overridden in the helm chart so that only attached volumes are used. So by default the helm chart doesn't need root filesystem access.

filepath: /active-query-tracker/activity.log

tokens_file_path: /data/tokens

However, since you've set the mimir.config value, these values do not propagate down to the rendered configmap. It's best if you set your configuration modifications via mimir.structuredConfig instead of mimir.config. Check out Manage the configuration of Grafana Mimir with Helm for more details.

I'm closing this because it seems that this is not an issue with the chart. Reopen if you think the chart is still non-compliant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants