Troubleshooting#
Installation and Verification#
If you encounter issues with installing and configuring prerequisites, follow these steps:
Ensure the NVIDIA Container Toolkit is properly installed and configured.
Verify that your GPU drivers are compatible with the CUDA version.
Check that your user has the necessary permissions to run Docker commands.
Consult the Configuring a NIM section for additional configuration options.
Running NIMs#
If you encounter issues running NIM for Cosmos, follow these steps:
Check that your hardware meets the Prerequisites.
Verify the NIM container is running properly with docker ps.
Ensure your request parameters are within supported ranges.
Check server logs for detailed error messages.
Metrics Collection#
If you encounter issues with metrics collection or visualization, follow these steps:
Ensure the NIM container is running and accessible at the expected address.
Verify that Prometheus can reach the NIM metrics endpoint.
Check Prometheus logs for any scraping errors.
Confirm that the metrics are available by directly accessing the metrics endpoint