Node Affinity

Affinityの手法一覧
Node Affinity
1. 使用例
requiredDuringSchedulingIgnoredDuringExecutionの条件
matchExpressionsの演算子
preferredDuringSchedulingIgnoredDuringExecutionの条件
最後に

Affinityの手法一覧

nodeSelector：シンプルなNode Affinity機能
Node Affinity：特定条件/特定条件以外のNode上だけ実行する
Inter-Pod Affinity：特定のPodがいるドメイン(Node、ゾーンなど)上で実行する
Inter-Pod Anti-Affinity：特定のPodがいないドメイン(Node、ゾーンなど)上で実行する

Node Affinityを使うことで、さらなる柔軟なスケジューリングができるようになります。
その代わりに、設定方法もnodeSelectorより複雑です。

Node Affinityで以下のような設定が可能になります。

requiredDuringSchedulingIgnoredDuringExecution：Pod配置の必須条件
- 設定箇所：spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
preferredDuringSchedulingIgnoredDuringExecution：Pod配置の優先条件
- 設定箇所：spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution

NodeがrequiredDuringSchedulingIgnoredDuringExecutionの条件に合わない場合、絶対にスケジューリングされないようになっています。
一方で、NodeがpreferredDuringSchedulingIgnoredDuringExecutionの条件に合わない場合、他に合うNodeがなければ、スケジューリングされます。

また、Node Affinityも基本Nodeのlabelsを条件として使用します。

使用例

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:  # 必須条件
        nodeSelectorTerms:  # 条件
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:  # 優先条件
      - weight: 1  # 優先度
        preference:  # 条件
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

※公式ドキュメント例

requiredDuringSchedulingIgnoredDuringExecutionの条件

公式ドキュメントでは、以下のように述べています。

If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the pod can be scheduled onto a node if one of the nodeSelectorTerms can be satisfied.
If you specify multiple matchExpressions associated with nodeSelectorTerms, then the pod can be scheduled onto a node only if all matchExpressions is satisfied.
If you remove or change the label of the node where the pod is scheduled, the pod won’t be removed. In other words, the affinity selection works only at the time of scheduling the pod.

すなわち、以下のことを言っています。

nodeSelectorTerms[].matchExpressionsがどれか1つでも合格したら、合格になる(OR処理)
nodeSelectorTerms[].matchExpressions[]中の条件、どれか1つでも不合格したら、このmatchExpressionsが不合格になる(AND処理)
node Affinityはスケジューリング時のみ処理する。スケジューリング済みのPodには関与しない

requiredDuringSchedulingIgnoredDuringExecution例：

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:  # 必須条件
        nodeSelectorTerms:  # 条件
        - matchExpressions:  # OR
          - key: cloud.google.com/gke-nodepool  # AND
            operator: In  # In演算子
            values:
            - default-pool
            - pool-1
          - key: node-label-name  # AND
            operator: NotIn  # NotIn演算子
            values:
            - label1
            - label2
            - label3
          - key: foo  # AND
            operator: Exists  # Exists演算子
          - key: bar  # AND
            operator: NotExists  # NotExits演算子
          - key: node-lable-number  # AND
            operator: Gt  # Gt演算子
            value: 100
          - key: node-lable-number  # AND
            operator: Lt  # Lt演算子
            value: 200
        - matchExpressions:  # OR
          - key: cloud.google.com/gke-nodepool  # AND
            operator: In
            values:
            - default-pool
          - key: kubernetes.io/os  # AND
            operator: In
            values:
            - linux
  containers:
  - name: nginx
    image: nginx

matchExpressionsの演算子

上記の例では、matchExpressionsの下記の演算子も例として出しています。

In：NodeのKeyラベルのValue値がValuesの中に含まれている
NotIn：NodeのKeyラベルのValue値がValuesの中に含まない
Exists：Nodeに指定のKeyラベルが存在する
NotExists：Nodeに指定のKeyラベルが存在しない
Gt：NodeのKeyラベルのValue値がValuesの値より大きい
Lt：NodeのKeyラベルのValue値がValuesの値より小さい

preferredDuringSchedulingIgnoredDuringExecutionの条件

requiredDuringSchedulingIgnoredDuringExecutionはmatchExpressions[]しかないので、AND演算しかありません。
その代わり、優先度を決めるためのweight属性を持っています。

公式ドキュメントでは以下のように述べています。

The weight field in preferredDuringSchedulingIgnoredDuringExecution is in the range 1-100. For each node that meets all of the scheduling requirements (resource request, RequiredDuringScheduling affinity expressions, etc.), the scheduler will compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node matches the corresponding MatchExpressions.

すなわち、Nodeが複数のmatchExpressionsにマッチする場合、weightが足し算され、最終的にscoreが一番高いNodeにスケジューリングされる仕組みとなります。
また、weightの付与範囲は1-100です。

preferredDuringSchedulingIgnoredDuringExecution例：

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:  # 優先条件
      - weight: 1  # 優先度
        preference:  # 条件
          matchExpressions:
          - key: cloud.google.com/gke-nodepool  # AND
            operator: In
            values:
            - default-pool
            - pool-1
          - key: foo  # AND
            operator: Exists
      - weight: 3  # 優先度
        preference:  # 条件
          matchExpressions:
          - key: node-label-name  # AND
            operator: NotIn
            values:
            - label1
            - label2
            - label3
          - key: bar  # AND
            operator: NotExists
  containers:
  - name: nginx
    image: nginx

この例では、両方のpreference同時に満たしている場合、weightが 1+3 で4となります。

最後に

今回はNode Affinityについてご紹介しました。
ついでに、以前Workloadsで触れていたけど、説明できなかったmatchExpressionsについても補足しました。
すなわち、matchExpressionsはReplicaSetのSelectorでも使用可能です。

ここまで読んでいただいて、お疲れ様でした。

kubernetes – Node Affinity