Prevent Spring Boot application closing until all current requests are finished

You could increase the terminationGracePeriodSeconds, the default is 30 seconds. But unfortunately, there's nothing to prevent a cluster admin from force deleting your pod, and there's all sorts of reasons the whole node could go away.


We did a combination of the above to resolve our problem.

  • increased the terminationGracePeriodSeconds to the absolute maximum we expect to see in production
  • added livenessProbe to prevent Traefik routing to our pod too soon
  • introduced a pre-stop hook injecting a pause and invoking a monitoring script:
    1. Monitored netstat for ESTABLISHED connections to our process (pid 1) with a Foreign Address of our cluster Traefik service
    2. sent TERM to pid 1

Note that because we send TERM to pid 1 from the monitoring script the pod will terminate at this point and the terminationGracePeriodSeconds never gets hit (it's there as a precaution)

Here's the script:

#!/bin/sh

while [ "$(/bin/netstat -ap 2>/dev/null | /bin/grep http-alt.*ESTABLISHED.*1/java | grep -c traefik-ingress-service)" -gt 0 ]
do
  sleep 1
done

kill -TERM 1

Here's the new pod spec:

containers:
  - env:
    - name: spring_profiles_active
      value: dev
    image: container.registry.host/project/app:@@version@@
    imagePullPolicy: Always
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - sleep 5 && /monitoring.sh
    livenessProbe:
      httpGet:
        path: /actuator/health
        port: 8080
      initialDelaySeconds: 60
      periodSeconds: 20
      timeoutSeconds: 3
    name: app
    ports:
    - containerPort: 8080
    readinessProbe:
      httpGet:
        path: /actuator/health
        port: 8080
      initialDelaySeconds: 60
    resources:
      limits:
        cpu: 2
        memory: 2Gi
      requests:
        cpu: 2
        memory: 2Gi
  imagePullSecrets:
  - name: app-secret
  serviceAccountName: vault-auth
  terminationGracePeriodSeconds: 86400