Webvar
Modular MAX: High-Performance GenAI Serving - logo

Modular MAX: High-Performance GenAI Serving

Serve the latest GenAI models with MAX Container - a GPU-accelerated serving environment with support for 500+ optimized models (https://builds.modular.com/), OpenAI API compatibility (https://docs.modular.com/max/api/serve), and enterprise-grade performance across diverse hardware and compute services.
awsPurchase this listing from Webvar in AWS Marketplace using your AWS account. In AWS Marketplace, you can quickly launch pre-configured software with just a few clicks. AWS handles billing and payments, and charges on your AWS bill.

About

The Modular Platform is an open and fully-integrated suite of AI libraries and tools that accelerates model serving and scales GenAI deployments. It abstracts away hardware complexity so you can run the most popular open models with industry-leading GPU and CPU performance without any code changes.

Our ready-to-deploy Docker container removes the complexity of deploying your own GenAI endpoint. And unlike other serving solutions, Modular enables customization across the entire stack. You can customize everything from the serving pipeline and model architecture all the way down to the metal by writing custom ops and GPU kernels in Mojo. Most importantly, Modular is hardware-agnostic and free from vendor lock-in no CUDA require so your code runs seamlessly across diverse systems.

MAX is a high-performance AI serving framework tailored for GenAI workloads. It provides low-latency, high-thoughput inference via advanced model serving optimizations like prefix caching and speculative decoding. An OpenAI-compatible serving endpoint executes native MAX and PyTorch models across GPUs and CPUs, and can be customized at the model and kernel level.

The MAX Container (max-nvidia-full) is a Docker image that packages the MAX Platform, pre-configured to serve hundreds of popular GenAI models on NVIDIA GPUs. This container is ideal for users seeking a fully optimized, out-of-the-box solution for deploying AI models.

Key capabilities include:

High-performance serving: Serve 500+ AI models from Hugging Face with industry-leading performance across NVIDIA GPUs

Flexible, portable serving: Deploy with a single Docker container across various GPUs (B200, H200, H100, A100, A10, L40 and L4) and compute services (EC2, EKS, AWS Batch, etc.) without compatibility issues.

OpenAI API Compatibility: Seamlessly integrate with applications adhering to the OpenAI API specification.

For detailed information on container contents and instance compatibility, refer to the MAX Containers Documentation (https://docs.modular.com/max/container).

To access our full Modular platform, check out https://www.modular.com/

Related Products

How it works?

Search

Search 25000+ products and services vetted by AWS.

Request private offer

Our team will send you an offer link to view.

Purchase

Accept the offer in your AWS account, and start using the software.

Manage

All your transactions will be consolidated into one bill in AWS.

Create Your Marketplace with Webvar!

Launch your marketplace effortlessly with our solutions. Optimize sales processes and expand your reach with our platform.