Command references and troubleshooting scenarios for resolving issues with the MCM Platform deployment.

MCM Platform — Troubleshooting Guide

This guide provides diagnostics, command references, and troubleshooting scenarios to identify and resolve issues with the MCM Platform deployment.

1. General Diagnostics

1.1 Checking System Resources

To check the available and used memory (RAM) on the host virtual machine:

free -h

To check disk space usage across all mounted file systems:

df -h

To view the number of CPU cores available to the system:

nproc

To inspect CPU and memory usage of the top running processes:

top -bn1 | head -20

1.2 Docker Daemon & Container Diagnostics

To check the Docker daemon's overall status, configuration, and driver details:

docker info

To view Docker disk usage statistics (images, containers, local volumes):

docker system df

To view live CPU and memory utilization statistics of all running containers:

docker stats --no-stream

To list all containers (including stopped or exited ones) with their status and exposed ports:

docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

1.3 Systemd Service Status

To inspect the status of the MCM background systemd service:

sudo systemctl status mcm.service

To view the system logs generated by the MCM service in the last hour:

sudo journalctl -u mcm.service --since "1 hour ago"

2. Container Fails to Start

Symptom: One or more containers show a status of Restarting or Exited in the docker ps list.

2.1 Diagnosis Steps

To inspect the application logs for a failing container:

docker logs <container-name>

To list only the containers that are currently failing their health checks:

docker ps --filter "health=unhealthy"

To verify if host ports 80 or 443 are already being occupied by another web server (e.g., Apache, Nginx):

ss -tlnp | grep -E ":(80|443)"

2.2 Common Causes & Resolutions

Cause	Resolution
Dependent service not healthy	Wait for MongoDB, Keycloak, or Elasticsearch to become fully healthy before starting application microservices.
Host port conflict	Stop the conflicting application service, or change the port mappings in `/etc/mcm/user_config.env`.
Out of memory	Increase host VM memory, or disable unused compose profiles to reduce the RAM footprint.
Missing certificates	Regenerate missing certificates using `/opt/mcm/scripts/generate-certs.sh`.
Missing secrets	Verify `/etc/mcm/secrets.env` exists and contains generated passwords.

3. Certificate Errors

Symptom: Service logs show PKIX path building failed, SSL handshake failed, or certificate verify failed when communicating internally or externally.

3.1 Diagnosis Steps

To verify if certificates exist in the configuration directory for a specific service:

ls -la /var/lib/mcm/mcm-api/certs/

To verify the validity and expiration dates of a specific service certificate file:

openssl x509 -in /var/lib/mcm/mcm-api/certs/mcm-api.crt -noout -dates

To list certificates and details contained inside the Java keystore:

keytool -list -v -keystore /var/lib/mcm/mcm-api/certs/keystore.p12 -storepass <KEYSTORE_PASSWORD>

3.2 Resolution

To force-regenerate all self-signed certificates and reload the truststores, perform these steps:

First, remove the current configuration fingerprint:

sudo rm /var/lib/mcm/.config_fingerprint

Next, restart the platform to generate new certificates and recreate keystores/truststores:

sudo bash /opt/mcm/scripts/restart.sh

4. Keycloak Authentication Issues

Symptom: Users cannot log in, the UI returns 401 Unauthorized, or logs report Invalid client secret.

4.1 Diagnosis Steps

To view the recent logs for Keycloak to check for client authorization errors:

docker logs keycloak | tail -50

To inspect the PostgreSQL database logs associated with Keycloak storage:

docker logs keycloak-postgres | tail -20

4.2 Resolution

Check that the KC_MCM_CLIENT_SECRET value in /etc/mcm/secrets.env matches the configuration inside Keycloak.
Verify that the Keycloak realm import file at /opt/mcm/keycloak/import/mcm.json was loaded successfully on initial startup.

To restart Keycloak individually to reload configurations:

docker restart keycloak

5. MongoDB Connection Failures

Symptom: Backend services throw a MongoSocketException or report authentication failed.

5.1 Diagnosis Steps

To check if the MongoDB database service is up and responding to administrative pings:

docker exec mongodb mongosh --eval "db.adminCommand('ping')"

To test application user authentication directly inside the database:

docker exec mongodb mongosh -u mcm -p <MONGO_INITDB_MCM_PASSWORD> --authenticationDatabase mcm --eval "db.getName()"

To check MongoDB logs for connection limits or auth failures:

docker logs mongodb | tail -50

5.2 Resolution

Confirm that the database password MONGO_INITDB_MCM_PASSWORD matches across /etc/mcm/secrets.env and the internal database setup.
If a password was changed after the database initialized, you will need to re-align the secrets.

To restart the MongoDB container:

docker restart mongodb

6. Elasticsearch Cluster Health Issues

Symptom: Search operations fail, or cluster health endpoints return errors.

6.1 Diagnosis Steps

To check the overall cluster health status (e.g., green, yellow, red):

curl -sk -u elastic:<ELASTIC_PASSWORD> https://localhost:9200/_cluster/health?pretty

To check shard allocation and see if there are unassigned shards:

curl -sk -u elastic:<ELASTIC_PASSWORD> https://localhost:9200/_cat/allocation?v

To view the status, size, and document count of all Elasticsearch indices:

curl -sk -u elastic:<ELASTIC_PASSWORD> https://localhost:9200/_cat/indices?v

6.2 Resolution

Status Yellow: Normal for a single-node deployment (since replica shards cannot be allocated to other nodes). No action is required.
Status Red: Shards are corrupt or unassigned. Check if the VM has run out of disk space, as Elasticsearch goes into read-only mode when disk usage crosses 90%.

7. APISIX Gateway Routing Errors

Symptom: Accessing the UI or APIs returns 502 Bad Gateway, 404 Not Found, or routing fails.

7.1 Diagnosis Steps

To inspect the APISIX proxy error and access logs:

docker logs apisix | tail -50

To view the initialization logs of APISIX to verify routes were imported:

docker logs init-apisix

To check if the backend etcd configuration store is running and healthy:

docker exec etcd etcdctl endpoint health

7.2 Resolution

Check that the init-apisix initialization container exited successfully with code 0.
Verify that upstream backend microservices are healthy and reachable.

To restart the APISIX container:

docker restart apisix

8. Service Health Check Failures

Symptom: The container status in docker ps remains (unhealthy).

8.1 Diagnosis Steps

To check if all required MCM Docker images are loaded in the host environment:

docker images | grep revdau

If images are missing, navigate to the image archive directory:

cd /root/mcm_artifacts/images

Then, manually reload all bundled Docker images from their archives:

for img in *.tar; do echo "Loading $img..."; docker load -i "$img"; done

To inspect the raw health check diagnostic history and failure logs of a container:

docker inspect --format='{{json .State.Health}}' <container-name> | python3 -m json.tool

To manually trigger a health check curl request from inside the container:

docker exec <container-name> curl -sk https://localhost:<PORT>/health

9. Disk Space Issues

Symptom: Database writes fail, containers crash, or logs show no space left on device.

9.1 Diagnosis Steps

To check the free disk space on all VM partitions:

df -h

To measure storage used by the Docker storage directory:

du -sh /var/lib/docker/

To measure storage used by Docker persistent volumes:

du -sh /var/lib/docker/volumes/

To check disk space usage inside Docker:

docker system df

9.2 Resolution

To prune unused containers, networks, and dangling images:

docker system prune -a -f

To prune unused Docker volumes:

docker volume prune -f

To truncate large Docker JSON log files without restarting containers:

truncate -s 0 /var/lib/docker/containers/*/*-json.log

To perform a complete teardown and clean up the platform directories:

sudo bash /opt/mcm/scripts/cleanup.sh

Troubleshooting Guide

On this page