Hi there, I have questions related to `topologyKey`. 1. Is that required for `po . . .

K8s_Member:
Hi there,
I have questions related to topologyKey.

  1. Is that required for podAffinity & podAntiAffinity?
  2. When scheduler take action, it will filter .labelSelector.matchExpression first to identify which node new pods will be schedule.
    But for some reasons like unavailable resources, new pod can’t be the same as that node, Scheduler will rely on topologyKey like http://topology.kubernetes.io/zone|topology.kubernetes.io/zone, http://kubernetes.io/hostname|kubernetes.io/hostname
    So that new pods still able to place the different Node but same as Zone, right?

I’ve found <kubernetes - What is topologyKey in pod affinity? - Stack Overflow useful> link but not totally understand it.

Alistair Mackay:

  1. Yes (some output below cut for brevity
$ k explain po.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution

KIND:       Pod
VERSION:    v1

FIELD: requiredDuringSchedulingIgnoredDuringExecution &lt;[]PodAffinityTerm&gt;

DESCRIPTION:
    If the affinity requirements specified by this field are not met at
    
FIELDS:
  labelSelector &lt;LabelSelector&gt;

  namespaceSelector     &lt;LabelSelector&gt;

  namespaces    &lt;[]string&gt;

  topologyKey   &lt;string&gt; -required-
    Empty topologyKey is not allowed.
  1. topopology really only does something when you actually have a topology, e.g. you have a cluster with many nodes deployed in Cloud. In AWS, the topology is Availability Zone (AZ) and is referred using the automatically generated node label <http://topology.kubernetes.io/zone|topology.kubernetes.io/zone>. The other clouds have a similar concept. For redundancy, a cloud region (e.g. US East) is a collection of more than one geographically (within 100km) separated data centers, each data center being an AZ, which means an entire region does not fail if one data center goes offline. Use topology to ensure that pods matching the selector get spread out across the AZs to improve resiliency.
    For the one/two node clusters used in labs and exam, then label <http://kubernetes.io/hpostname|kubernetes.io/hpostname> is sufficient - this is trying to create a spread across available nodes so that two pods matching the affinity don’t end up on the same node, if there is a choice of nodes.

K8s_Member:
Thanks sir.
So the order is .labelSelector.matchExpression first, then topologyKey. right?
Ex: I have 3 nodes with 2 AZ-a, 1 AZ-b in AWS.

  1. Step 1: kube-scheduler identify that there is one pod matching .labelSelector.matchExpression on Node AZ-a & AZ-b.
    a. At the same time, there is the same as one scheduling pod reside on that node AZ-a
  2. Step 2: So Scheduler will try to schedule new pod into node AZ-b to ensure that pods spread as much multi AZ as possible. right?

Alistair Mackay:
That should be it. If you search around enough you might find somebody has flowcharted the scheduler process