gcpdiag

gcpdiag 개요

실행 내용과 결과 해석은 다음과 같음

[gcpdiag run]

[hj@cs-491314827780-default ~ (☸️ |hj-gke:default)]$ gcpdiag
Unable to find image 'us-docker.pkg.dev/gcpdiag-dist/release/gcpdiag:0.55' locally
0.55: Pulling from gcpdiag-dist/release/gcpdiag
1fe172e4850f: Pull complete
caf521ccaac6: Pull complete
3ead6fa29328: Pull complete
5c2a1cbceb83: Pull complete
a8d5f1318db7: Pull complete
358ca086baa3: Pull complete
30197a52639a: Pull complete
65545a04dace: Pull complete
80cf14a4e373: Pull complete
00f344331f1a: Pull complete
5fbfd71d5b4d: Pull complete
Digest: sha256:0b5fcc0fd3e2f1b822cec492b0128f7e1df5173c19990570ee072c80cf6164c4
Status: Downloaded newer image for us-docker.pkg.dev/gcpdiag-dist/release/gcpdiag:0.55
gcpdiag 🩺 - Diagnostics for Google Cloud Platform

Usage:
        gcpdiag COMMAND [OPTIONS]

Commands:
        help     Print this help text.
        lint     Run diagnostics on GCP projects.
        version  Print gcpdiag version.

See: gcpdiag COMMAND --help for command-specific usage.

[gcpdiag --project]

실행 결과 실패한 내용을 rough하게 살펴보면…

  1. serial port가 enable 안되어 있어서... (👎️)

  2. gce node ops agent가 설치 안되서... (👎️)

  3. uniform access(단일 접근) 아니라서…(👍️)

  4. GKE가 리저널이 아니라서…(👌)

  5. GKE가 기본 서비스 어카운트를 써서...(👍️)

뭐 이정도면…한번쯤 cloud shell에서 돌려보고 참고해도 괜찮을꺼 같음.

[hj@cs-491314827780-default ~ (☸️ |hj-gke:default)]$ gcpdiag lint --project=hj-int-20200908
gcpdiag 🩺  0.55

Starting lint inspection (project: hj-int-20200908)...

🔎  gce/BP/2021_001: Serial port logging is enabled.
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [FAIL]
   - hj-int-20200908/windows                                              [FAIL]
   - hj-int-20200908/zipkin-vm                                            [FAIL]

   Serial port output can be often useful for troubleshooting, and enabling
   serial logging makes sure that you don't lose the information when the VM is
   restarted. Additionally, serial port logs are timestamped, which is useful to
   determine when a particular serial output line was printed.

   https://gcpdiag.dev/rules/gce/BP/2021_001

🔎  gce/BP/2021_002: GCE nodes have an up to date ops agent installed.
   - hj-int-20200908/gui-pipe                                             [FAIL] not installed
   - hj-int-20200908/windows                                              [FAIL] not installed
   - hj-int-20200908/zipkin-vm                                            [FAIL] not installed

   Verify that the ops agent is used by the GCE instances and that the agent is
   recent enough. If the monitoring agent is found it is recommended to upgrade
   to the ops agent.  see:
   https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent

   https://gcpdiag.dev/rules/gce/BP/2021_002

🔎  gce/ERR/2021_003: Google APIs service agent has the Editor role.
   - hj-int-20200908                                                      [ OK ]

🔎  gce/ERR/2021_004: Serial logs don't contain Secure Boot error messages
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/ERR/2021_005: Serial logs don't contain mount error messages
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2021_001: GCE instance service account permissions for logging.
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2021_003: GCE instance service account permissions for monitoring.
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2021_004: Serial logs don't contain disk full messages
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2021_005: Serial logs don't contain out-of-memory messages
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2021_006: Serial logs don't contain "Kernel panic" messages
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2021_007: Serial logs don't contain "BSOD" messages
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2022_001: GCE connectivity: IAP service can connect to SSH/RDP port on instances.
   - hj-int-20200908/gke-gke-nodes-634bd260-gefn                          [ OK ]
   - hj-int-20200908/gui-pipe                                             [ OK ]
   - hj-int-20200908/windows                                              [ OK ]
   - hj-int-20200908/zipkin-vm                                            [ OK ]

🔎  gce/WARN/2022_002: Instance groups named ports are using unique names.
   - hj-int-20200908/gke-gke-nodes-634bd260-grp                           [ OK ]

🔎  gce/WARN/2022_003: GCE VM instances quota is not near the limit.
   - projects/hj-int-20200908/regions/asia-northeast3                     [ OK ]
   - projects/hj-int-20200908/regions/us-central1                         [ OK ]

🔎  gcs/BP/2022_001: Buckets are using uniform access
   - hj-int-20200908/artifacts.hj-int-20200908.appspot.com                [FAIL]
     it is recommend to use uniform access on your bucket
   - hj-int-20200908/hoon                                                 [FAIL]
     it is recommend to use uniform access on your bucket

   Google recommends using uniform access for a Cloud Storage bucket IAM policy
   https://cloud.google.com/storage/docs/access-
   control#choose_between_uniform_and_fine-grained_access

   https://gcpdiag.dev/rules/gcs/BP/2022_001

🔎  gke/BP/2021_001: GKE system logging and monitoring enabled.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/BP/2022_001: GKE clusters are regional.
   - hj-int-20200908/us-central1-c/gke                                    [FAIL]
      is not regional

   The availability of regional clusters (both control plane and nodes) is
   higher for regional clusters as they are replicated across zones in the
   region. It is recommended to use regional clusters for the production
   workload.

   https://gcpdiag.dev/rules/gke/BP/2022_001

🔎  gke/BP/2022_002: GKE clusters are using unique subnets.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_001: GKE nodes service account permissions for logging.
   - hj-int-20200908/us-central1-c/gke/nodes                              [ OK ]

🔎  gke/ERR/2021_002: GKE nodes service account permissions for monitoring.
   - hj-int-20200908/us-central1-c/gke/nodes                              [ OK ]

🔎  gke/ERR/2021_004: GKE nodes aren't reporting connection issues to apiserver.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_005: GKE nodes aren't reporting connection issues to storage.google.com.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_006: GKE Autoscaler isn't reporting scaleup failures.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_007: GKE service account permissions.
   - hj-int-20200908                                                      [ OK ]

🔎  gke/ERR/2021_008: Google APIs service agent has Editor role.
   - hj-int-20200908                                                      [ OK ]

🔎  gke/ERR/2021_009: Version skew between cluster and node pool.
   - hj-int-20200908/us-central1-c/gke/nodes                              [ OK ]

🔎  gke/ERR/2021_010: Check internal peering forwarding limits which affect GKE.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_011: ip-masq-agent not reporting errors
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_012: Node pool service account exists and not is disabled.
   - hj-int-20200908/us-central1-c/gke/nodes                              [ OK ]

🔎  gke/ERR/2021_013: GKE cluster firewall rules are configured.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2021_015: GKE connectivity: node to pod communication.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2022_001: GKE connectivity: pod to pod communication.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/ERR/2022_002: GKE nodes of private clusters can access Google APIs and services.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/SEC/2021_001: GKE nodes don't use the GCE default service account.
   - hj-int-20200908/us-central1-c/gke/nodes                              [FAIL]
     node pool uses the GCE default service account

   The GCE default service account has more permissions than are required to run
   your Kubernetes Engine cluster. You should either use GKE Workload Identity
   or create and use a minimally privileged service account.

   https://gcpdiag.dev/rules/gke/SEC/2021_001

🔎  gke/WARN/2021_003: GKE cluster size close to maximum allowed by pod range
   - hj-int-20200908/us-central1-c/gke                                    [ OK ] 1/1024 nodes used.

🔎  gke/WARN/2021_004: GKE system workloads are running stable.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/WARN/2021_005: GKE nodes have good disk performance.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/WARN/2021_006: GKE nodes aren't reporting conntrack issues.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/WARN/2021_007: GKE nodes have enough free space on the boot disk.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/WARN/2021_009: GKE nodes use a containerd image.
   - hj-int-20200908/us-central1-c/gke/nodes                              [ OK ]

🔎  gke/WARN/2022_001: GKE clusters with workload identity are regional.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/WARN/2022_002: GKE metadata concealment is not in use
   - hj-int-20200908/us-central1-c/gke/nodes                              [ OK ]

🔎  gke/WARN/2022_003: GKE service account permissions to manage project VPC firewall rules.
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  gke/WARN/2022_004: Cloud Logging API enabled when GKE logging is enabled
   - hj-int-20200908/us-central1-c/gke                                    [ OK ]

🔎  iam/SEC/2021_001: No service accounts have the Owner role
   - hj-int-20200908                                                      [ OK ]

Rules summary: 24 skipped, 40 ok, 5 failed

Last updated