The Inference Server - Llama.cpp - CUDA - NVIDIA Container - Ubuntu 22
Purchase this listing from Webvar in AWS Marketplace using your AWS account. In AWS Marketplace, you can quickly launch pre-configured software with just a few clicks. AWS handles billing and payments, and charges on your AWS bill.About
The Inference server offers the full infrastructure to run fast inference on GPUs.
It includes llama.cpp inference, latest CUDA and NVIDIA Docker container toolkit.
Leverage the multitude of models freely available to run inference with 8 bit or lower quantized models which makes inference possible on e.g. 16 GB or 24 GB memory GPUs.
Llama.cpp offer efficient inference of quantized models in interactive and server mode. It features
Plain C/C++ implementation without dependencies
2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
Running inference on GPU and CPU simultaneously allowing to run larger models in case GPU memory is insufficient
AVX, AVX2 and AVX512 support for x86 architectures
Supported models: LLaMA, LLaMA 2, Falcon, Alpaca, GPT4All, Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2, Vigogne (French), Vicuna, Koala, OpenBuddy (Multilingual), Pygmalion 7B / Metharme 7B, WizardLM, Baichuan-7B and its derivations (such as baichuan-7b-sft), Aquila-7B / AquilaChat-7B, Starcoder models, Mistral AI v0.1, Refact
Here is our guide How to use the AI SP Inference Server
The Inference server supports in addition
llama-cpp-python: OpenAI API compatible Llama.cpp inference server
Open Interpreter: let language models run code on your computer. An open-source, locally running implementation of OpenAIs Code Interpreter.
Tabby coding assistant: a self-hosted AI coding assistant, offering an open-source alternative to GitHub Copilot
Includes remote desktop access via NICE DCV high-end remote desktops or via ssh (putty, ...).
Related Products
show moreHow it works?
Search
Search 25000+ products and services vetted by AWS.
Request private offer
Our team will send you an offer link to view.
Purchase
Accept the offer in your AWS account, and start using the software.
Manage
All your transactions will be consolidated into one bill in AWS.
Create Your Marketplace with Webvar!
Launch your marketplace effortlessly with our solutions. Optimize sales processes and expand your reach with our platform.