Artificial Intelligence and Machine Learning products and solutions are quickly becoming commonplace today and are shaping our experiences in computing like no other time in history. Interactive speech (e.g., Alexa, Google Home, etc.), Visual Search and recommendation engines are just a few of the consumer applications that are available today on our phones, websites and e-commerce platforms. The impact of machine learning is getting broader with enterprise applications in health sciences (e.g. Dr. Watson), finance, security, data centers and cyber surveillance.
These AI applications and solutions are now more viable than ever with the availability of modern machine learning and deep learning tools such as TensorFlow, Caffe, etc. and access to GPUs that are built specifically to perform parallel operations on large amounts of data, e.g., multiplying matrices of tens or hundreds of thousands of numbers. Processing large data sets through the same hypothesized algorithm for learning and for intelligent inference is a fairly common operation in machine learning and deep learning applications.
However, one significant challenge remains: deploying, configuring, and executing these complex tools and managing their inter-dependencies and versioning and compatibility with servers and GPUs. For example, in order to run TensorFlow, users need to make sure that they have the correct version of BIOS on their server, the compatible Windows or Linux drivers, and CUDA library for the specific GPU and server they want to run their AI workload on. If any of these are not correctly configured with compatible versions, the AI application will not function correctly or will perform very poorly.