Comprehensive operational runbook, diagnostic commands, and utility script references for the MCM Platform.

MCM Platform — Helper Scripts & Commands

This guide serves as a comprehensive operational runbook for managing, diagnosing, and maintaining the Multi-Cloud Management (MCM) platform. It details environment-specific control commands (3-Tier vs. Single VM), host-level resource diagnostics, container platform statistics, SSL/TLS certificate inspections, and utility script executions.

1. Platform Control & Monitoring (Docker Swarm & Systemd)

These commands are used to check the cluster topology, service health, scheduling, and to manage the platform's background system daemon.

Cluster Node Status

Verify that all virtual machines are joined, ready, and active inside the cluster management plane (run on Access Node for 3-tier):

docker node ls

Explanation: Lists all nodes participating in the orchestrated cluster, their status (Ready/Down), availability (Active/Drain), and their role (e.g., Access Node as Leader Manager, Worker nodes as Worker).

Service Replica Health

Verify that all backend application and database services are running with their target replicas (run on Access Node for 3-tier):

docker service ls

Explanation: Returns a high-level list of all deployed services, their scheduling mode, and replica counts. Healthy services display 1/1 (indicating the service task is running and has passed health checks). Initialization tasks (prefixed with mcm_init-) transition to 0/1 or exit once they complete database schema migrations or routing setup.

Stack Task Mapping

Inspect the scheduling and actual node mapping of container instances across the cluster (run on Access Node for 3-tier):

docker stack ps mcm

Explanation: Displays exactly which node each service container is currently running on, its current state (Running, Preparing, or Shutdown), and error logs if a container failed to start.

Tail Service Application Logs

To inspect application-level outputs and debug issues, tail the stdout/stderr logs of a specific cluster service (run on Access Node for 3-tier):

# General service tailing
docker service logs mcm_<service-name> --tail 100 --follow

# View Keycloak IAM initialization logs
docker service logs mcm_keycloak --tail 100

# View Backend Core API connectivity logs
docker service logs mcm_mcm-api --tail 100

Explanation: docker service logs fetches aggregated logs from all container replicas associated with that service across all cluster hosts.

Restart a Specific Service / Container

To restart any individual service or container in the Swarm cluster (run on Access Node for 3-tier):

docker service update --force mcm_<service-name>

Explanation: Performs a rolling restart of all container replicas for the designated service without taking down the rest of the platform.

Tear Down the Stack

To safely stop and delete all service configurations, containers, and networks in the platform stack (run on Access Node for 3-tier):

docker stack rm mcm

Explanation: Removes the orchestrated stack deployment. This tears down running containers and virtual networks without deleting persistent volumes (databases remain intact).

Systemd Service Management

Control the background system service wrapper:

# Check service status
sudo systemctl status mcm.service

# Start the platform
sudo systemctl start mcm.service

# Stop the platform
sudo systemctl stop mcm.service

# Restart the platform
sudo systemctl restart mcm.service

# Enable auto-start at system boot
sudo systemctl enable mcm.service

# Disable auto-start at system boot
sudo systemctl disable mcm.service

Explanation: Manages the host operating system service state for the platform orchestrator daemon.

Service Log Inspection

View system logs generated by the systemd daemon:

# View live real-time log streams
sudo journalctl -u mcm.service -f

# View the last 100 lines of log output
sudo journalctl -u mcm.service -n 100 --no-pager

# View logs generated within a specific timeframe
sudo journalctl -u mcm.service --since "30 minutes ago" --no-pager

Explanation: Displays output logs captured from stdout/stderr of the systemd wrapper service, which is helpful for troubleshooting container compilation or host launch script execution.

2. Host-Level Resource Diagnostics (Common)

Before running the installer or when diagnosing performance degradation, use these commands on any VM to inspect host hardware resources.

Checking System Memory

Verify available, used, and cached RAM on the host virtual machine:

free -h

Explanation: Displays the total RAM, how much is consumed, how much is free, and the swap usage. The -h flag converts bytes into human-readable formats (e.g., GB, MB).

Checking Disk Space

Verify storage capacity and disk usage across all mounted file systems:

df -h

Explanation: Shows disk partition allocations. Ensure the root partition (/) has at least 15–20% free space to prevent write blocks on database and container volumes.

CPU Core Count

Identify the number of logical processor cores available on the VM:

nproc

Explanation: Returns the active processor count, which helps verify if the instance size matches the minimum required specifications.

Process CPU & Memory Utilization

Inspect real-time CPU and memory usage of the top running processes:

top -bn1 | head -22

Explanation: Runs the system monitor in batch mode (-b), performs a single iteration (-n1), and prints the header plus the top 15 resource-consuming processes to diagnose CPU spikes or memory leaks.

3. Container Platform Diagnostics (Common)

Use these commands on any cluster VM to inspect the local container engine daemon, storage pools, and performance.

Container Engine Info

Inspect the container platform daemon's configuration, active runtime, and overall state:

docker info

Explanation: Returns system-wide information including the number of running/paused/stopped containers, active storage driver (usually overlay2), and network plugin configurations.

Container Storage Allocation

Check disk space usage allocated to images, container runtimes, and local volumes:

docker system df

Explanation: Displays a breakdown of space consumed by images, active container writable layers, local volumes, and build cache, highlighting reclaimable space.

Live Container Statistics

View real-time CPU, memory, network, and disk I/O usage statistics for all running containers on the current host:

docker stats --no-stream

Explanation: Outputs a single snapshot (--no-stream) of resource consumption for all active local containers, helping identify memory-heavy or CPU-unbounded services.

Host Network Port Bindings

Identify which processes or sockets are occupying specific network ports:

ss -tlnp | grep -E ":(80|443)"

Explanation: Lists active TCP sockets in listening mode (-l) with numerical ports (-n) and owning process IDs (-p). Helpful for verifying that ports 80 and 443 are free before deploying the access node gateway.

4. SSL/TLS Certificate and Keystore Inspections (Common)

These commands help inspect and validate TLS certificate files and Java Keystores directly on the VM filesystems.

Certificate File Verification

List the certificates directory contents for a specific service:

# E.g., for the Core API service
ls -la /var/lib/mcm/mcm-api/certs/

Explanation: Lists the private keys, certificate files, and truststores generated for the microservice.

Certificate Validity & Expiration

Check the issue and expiration dates of a certificate file to diagnose TLS handshake errors:

openssl x509 -in /var/lib/mcm/mcm-api/certs/mcm-api.crt -noout -dates

Explanation: Reads the X.509 certificate file (-in) and prints only the start and expiration dates (-dates) without outputting the raw certificate text (-noout).

Java Keystore Inspection

Inspect the certificates and credentials contained inside a Java keystore (keystore.p12):

keytool -list -v -keystore /var/lib/mcm/mcm-api/certs/keystore.p12 -storepass <KEYSTORE_PASSWORD>

Explanation: Lists keystore entries in verbose mode (-v) to verify that the private key and certificate chain are correctly imported and valid.

# Define credentials at the top
REGISTRY_USER="your_username"
REGISTRY_PASS="your_password"

# Download the archive from the registry repository
wget --user="$REGISTRY_USER" --password="$REGISTRY_PASS" -O mcm_artifacts_[NEW_VERSION].tar.gz "http://92.204.249.45:8080/repository/mcm-artifacts/[NEW_VERSION]/mcm_artifacts_[NEW_VERSION].tar.gz"

Step 2: Extract the Archive & Navigate

tar -xzvf mcm_artifacts_[NEW_VERSION].tar.gz
cd mcm_artifacts

Step 3: Run the Upgrade Script

Execute the upgrade script with root privileges:

# Full upgrade (reloads images and configurations)
sudo ./upgrade.sh

# Configuration-only upgrade (skips loading large image files)
sudo ./upgrade.sh --skip-images

Explanation:
- ./upgrade.sh updates the container image references, resyncs configurations, and rolls out updates to the service containers without deleting database tables or volumes.
- The --skip-images flag is useful when you have only modified environment files (user_config.env) or certificate stores and want to apply configuration changes quickly without reloading large offline image archives.

Launch the Platform (`start.sh` on Access Node for 3-tier)

Triggers the cluster nodes to pull/evaluate configurations and launch the stack:

sudo /opt/mcm/scripts/start.sh

Explanation: Safely provisions the runtime networks and initiates stack scheduling.

Stop the Platform (`stop.sh` on Access Node for 3-tier)

Gracefully shuts down running service tasks and removes the stack from the runtime:

sudo /opt/mcm/scripts/stop.sh

Explanation: Sends SIGTERM signals to running containers to allow database transactions to complete before shutting down.

Restart the Platform (`restart.sh` on Access Node for 3-tier)

Performs a sequential stop-and-start lifecycle sequence:

sudo /opt/mcm/scripts/restart.sh

Explanation: Executes the stop script, waits for networks to clear, and runs the start script to apply new configuration values or reload certificates cleanly.

Helper Scripts and Commands