Marketplace For Buyers For Vendors For Partners

The Inference Server - Llama.cpp - CUDA - NVIDIA Container - Ubuntu 22

Run AI Inference on your own server for coding support, creative writing, summarizing, ... without sharing data with other services. The Inference server has all you need to run state-of-the-art inference on GPU servers. Includes llama.cpp inference, latest CUDA and NVIDIA Docker container support. Support for llama-cpp-python, Open Interpreter, Tabby coding assistant.

Purchase this listing from Webvar in AWS Marketplace using your AWS account. In AWS Marketplace, you can quickly launch pre-configured software with just a few clicks. AWS handles billing and payments, and charges on your AWS bill.

About

The Inference server offers the full infrastructure to run fast inference on GPUs.

It includes llama.cpp inference, latest CUDA and NVIDIA Docker container toolkit.

Leverage the multitude of models freely available to run inference with 8 bit or lower quantized models which makes inference possible on e.g. 16 GB or 24 GB memory GPUs.

Llama.cpp offer efficient inference of quantized models in interactive and server mode. It features

Plain C/C++ implementation without dependencies

2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support

Running inference on GPU and CPU simultaneously allowing to run larger models in case GPU memory is insufficient

AVX, AVX2 and AVX512 support for x86 architectures

Supported models: LLaMA, LLaMA 2, Falcon, Alpaca, GPT4All, Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2, Vigogne (French), Vicuna, Koala, OpenBuddy (Multilingual), Pygmalion 7B / Metharme 7B, WizardLM, Baichuan-7B and its derivations (such as baichuan-7b-sft), Aquila-7B / AquilaChat-7B, Starcoder models, Mistral AI v0.1, Refact

Here is our guide How to use the AI SP Inference Server

The Inference server supports in addition

llama-cpp-python: OpenAI API compatible Llama.cpp inference server

Open Interpreter: let language models run code on your computer. An open-source, locally running implementation of OpenAIs Code Interpreter.

Tabby coding assistant: a self-hosted AI coding assistant, offering an open-source alternative to GitHub Copilot

Includes remote desktop access via NICE DCV high-end remote desktops or via ssh (putty, ...).

Related Products

Sedai for ECS: Autonomous Optimization & Remediation

Sedai for ECS is an autonomous cloud management platform powered by AI/ML delivering continuous optimization for cloud operations teams to reduce costs by 40%, improve performance by 35%, reduce FCIs by 50% and improve ops productivity 6x

AI based Intelligent Document Processing

Our product is an advanced artificial intelligence solution designed to streamline document processing workflows for businesses of all sizes. Leveraging cutting-edge machine learning algorithms, it automates the extraction, analysis, and interpretation of valuable information from a variety of documents, including invoices, contracts, forms, and more. With this powerful tool, businesses can significantly reduce manual effort, minimize errors, and accelerate decision-making processes.

Automated SAS to Pyspark Conversion

SAS workloads come with high costs, scalability challenges, and limited flexibility. Hexaware’s Amaze® accelerates SAS-to-PySpark migration using Gen AI and LLM-powered automation, ensuring a 70-80% conversion accuracy, 3x-5x faster execution, and up to 40% cost savings.

Vectra Cognito for Service Delivery Partners (Billed Monthly)

Vectra Threat Detection and Response for Partner Provided Consulting Engagements

Real-time & Historical Datafeeds | Global Post-War Contemporary Auctions

This dataset is prepared for statistical factor pricing models and standardized across variables including country, region, currency, vendor, artist for seamless data filtering. It contains 20+ years of all items in the Post-War & Contemporary art category sold on auction by Christie’s, Sotheby’s, Bonhams and Phillips from 2000 to date.

Philter

Philter deidentifies and redacts sensitive information, such as Personally Identifiable Information (PII) and Protected Health Information (PHI), in text.

Tuberculosis - Total number of cases in the US | CDC

Centers for Disease Control and Prevention provides free and open access to various health related data. This release contains total number of Tuberculosis cases reported in the United States, by region and by states, in accordance with the current method of displaying WONDER data. Data on United States will exclude counts from US territories. The data is available for past 2 years.

AWS Security Review

An AWS Security Review is an opportunity to ensure that your cloud infrastructure is optimized for performance, security, reliability, and cost-effectiveness. Our certified AWS architects will identify any potential issues or areas for improvement in your AWS environment and make recommendations to get your back on track.