Troubleshooting Guide
Command references and troubleshooting scenarios for resolving issues with the MCM Platform deployment.
MCM Platform — Troubleshooting Guide
This guide provides diagnostics, command references, and troubleshooting scenarios to identify and resolve issues with the MCM Platform deployment.
1. General Diagnostics
1.1 Checking System Resources
To check the available and used memory (RAM) on the host virtual machine:
free -hTo check disk space usage across all mounted file systems:
df -hTo view the number of CPU cores available to the system:
nprocTo inspect CPU and memory usage of the top running processes:
top -bn1 | head -201.2 Docker Daemon & Container Diagnostics
To check the Docker daemon's overall status, configuration, and driver details:
docker infoTo view Docker disk usage statistics (images, containers, local volumes):
docker system dfTo view live CPU and memory utilization statistics of all running containers:
docker stats --no-streamTo list all containers (including stopped or exited ones) with their status and exposed ports:
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"1.3 Systemd Service Status
To inspect the status of the MCM background systemd service:
sudo systemctl status mcm.serviceTo view the system logs generated by the MCM service in the last hour:
sudo journalctl -u mcm.service --since "1 hour ago"2. Container Fails to Start
Symptom: One or more containers show a status of Restarting or Exited in the docker ps list.
2.1 Diagnosis Steps
To inspect the application logs for a failing container:
docker logs <container-name>To list only the containers that are currently failing their health checks:
docker ps --filter "health=unhealthy"To verify if host ports 80 or 443 are already being occupied by another web server (e.g., Apache, Nginx):
ss -tlnp | grep -E ":(80|443)"2.2 Common Causes & Resolutions
| Cause | Resolution |
|---|---|
| Dependent service not healthy | Wait for MongoDB, Keycloak, or Elasticsearch to become fully healthy before starting application microservices. |
| Host port conflict | Stop the conflicting application service, or change the port mappings in /etc/mcm/user_config.env. |
| Out of memory | Increase host VM memory, or disable unused compose profiles to reduce the RAM footprint. |
| Missing certificates | Regenerate missing certificates using /opt/mcm/scripts/generate-certs.sh. |
| Missing secrets | Verify /etc/mcm/secrets.env exists and contains generated passwords. |
3. Certificate Errors
Symptom: Service logs show PKIX path building failed, SSL handshake failed, or certificate verify failed when communicating internally or externally.
3.1 Diagnosis Steps
To verify if certificates exist in the configuration directory for a specific service:
ls -la /var/lib/mcm/mcm-api/certs/To verify the validity and expiration dates of a specific service certificate file:
openssl x509 -in /var/lib/mcm/mcm-api/certs/mcm-api.crt -noout -datesTo list certificates and details contained inside the Java keystore:
keytool -list -v -keystore /var/lib/mcm/mcm-api/certs/keystore.p12 -storepass <KEYSTORE_PASSWORD>3.2 Resolution
To force-regenerate all self-signed certificates and reload the truststores, perform these steps:
First, remove the current configuration fingerprint:
sudo rm /var/lib/mcm/.config_fingerprintNext, restart the platform to generate new certificates and recreate keystores/truststores:
sudo bash /opt/mcm/scripts/restart.sh4. Keycloak Authentication Issues
Symptom: Users cannot log in, the UI returns 401 Unauthorized, or logs report Invalid client secret.
4.1 Diagnosis Steps
To view the recent logs for Keycloak to check for client authorization errors:
docker logs keycloak | tail -50To inspect the PostgreSQL database logs associated with Keycloak storage:
docker logs keycloak-postgres | tail -204.2 Resolution
- Check that the
KC_MCM_CLIENT_SECRETvalue in/etc/mcm/secrets.envmatches the configuration inside Keycloak. - Verify that the Keycloak realm import file at
/opt/mcm/keycloak/import/mcm.jsonwas loaded successfully on initial startup.
To restart Keycloak individually to reload configurations:
docker restart keycloak5. MongoDB Connection Failures
Symptom: Backend services throw a MongoSocketException or report authentication failed.
5.1 Diagnosis Steps
To check if the MongoDB database service is up and responding to administrative pings:
docker exec mongodb mongosh --eval "db.adminCommand('ping')"To test application user authentication directly inside the database:
docker exec mongodb mongosh -u mcm -p <MONGO_INITDB_MCM_PASSWORD> --authenticationDatabase mcm --eval "db.getName()"To check MongoDB logs for connection limits or auth failures:
docker logs mongodb | tail -505.2 Resolution
- Confirm that the database password
MONGO_INITDB_MCM_PASSWORDmatches across/etc/mcm/secrets.envand the internal database setup. - If a password was changed after the database initialized, you will need to re-align the secrets.
To restart the MongoDB container:
docker restart mongodb6. Elasticsearch Cluster Health Issues
Symptom: Search operations fail, or cluster health endpoints return errors.
6.1 Diagnosis Steps
To check the overall cluster health status (e.g., green, yellow, red):
curl -sk -u elastic:<ELASTIC_PASSWORD> https://localhost:9200/_cluster/health?prettyTo check shard allocation and see if there are unassigned shards:
curl -sk -u elastic:<ELASTIC_PASSWORD> https://localhost:9200/_cat/allocation?vTo view the status, size, and document count of all Elasticsearch indices:
curl -sk -u elastic:<ELASTIC_PASSWORD> https://localhost:9200/_cat/indices?v6.2 Resolution
- Status Yellow: Normal for a single-node deployment (since replica shards cannot be allocated to other nodes). No action is required.
- Status Red: Shards are corrupt or unassigned. Check if the VM has run out of disk space, as Elasticsearch goes into read-only mode when disk usage crosses 90%.
7. APISIX Gateway Routing Errors
Symptom: Accessing the UI or APIs returns 502 Bad Gateway, 404 Not Found, or routing fails.
7.1 Diagnosis Steps
To inspect the APISIX proxy error and access logs:
docker logs apisix | tail -50To view the initialization logs of APISIX to verify routes were imported:
docker logs init-apisixTo check if the backend etcd configuration store is running and healthy:
docker exec etcd etcdctl endpoint health7.2 Resolution
- Check that the
init-apisixinitialization container exited successfully with code0. - Verify that upstream backend microservices are healthy and reachable.
To restart the APISIX container:
docker restart apisix8. Service Health Check Failures
Symptom: The container status in docker ps remains (unhealthy).
8.1 Diagnosis Steps
To check if all required MCM Docker images are loaded in the host environment:
docker images | grep revdauIf images are missing, navigate to the image archive directory:
cd /root/mcm_artifacts/imagesThen, manually reload all bundled Docker images from their archives:
for img in *.tar; do echo "Loading $img..."; docker load -i "$img"; doneTo inspect the raw health check diagnostic history and failure logs of a container:
docker inspect --format='{{json .State.Health}}' <container-name> | python3 -m json.toolTo manually trigger a health check curl request from inside the container:
docker exec <container-name> curl -sk https://localhost:<PORT>/health9. Disk Space Issues
Symptom: Database writes fail, containers crash, or logs show no space left on device.
9.1 Diagnosis Steps
To check the free disk space on all VM partitions:
df -hTo measure storage used by the Docker storage directory:
du -sh /var/lib/docker/To measure storage used by Docker persistent volumes:
du -sh /var/lib/docker/volumes/To check disk space usage inside Docker:
docker system df9.2 Resolution
To prune unused containers, networks, and dangling images:
docker system prune -a -fTo prune unused Docker volumes:
docker volume prune -fTo truncate large Docker JSON log files without restarting containers:
truncate -s 0 /var/lib/docker/containers/*/*-json.logTo perform a complete teardown and clean up the platform directories:
sudo bash /opt/mcm/scripts/cleanup.sh