Webvar
DeepSeek R1 Distill Qwen 1.5B - logo

DeepSeek R1 Distill Qwen 1.5B

A self-hosted production-ready DeepSeek-R1-Distill-Qwen-1.5B model (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) running seamlessly in your private AWS cloud! With an easy single-click installation, set up all the essential infrastructure in your own cloud environment hassle-free. Plus, you will have quick access to an API endpoint that is ready for your queries and scales automatically based on your needs. Best of all, with the service operating solely in your cloud, your data remains completely secure and confidential, never leaving your private space. Experience peace of mind and unleash the full potential of DeepSeek R1 models today!
awsPurchase this listing from Webvar in AWS Marketplace using your AWS account. In AWS Marketplace, you can quickly launch pre-configured software with just a few clicks. AWS handles billing and payments, and charges on your AWS bill.

About

This service offers a hosted version of the DeepSeek-R1-Distill-Qwen-1.5B model (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), which operates within your private cloud. After you subscribe to the listing, a CloudFormation deployment will initiate in your AWS account, setting up an EKS cluster running an inference service for the DeepSeek-R1-Distill-Qwen-1.5B model. Once the installation is complete, an API endpoint will be made available for seamless service queries!

DeepSeek-R1-Distill-Qwen-1.5B model is fine-tuned based on the open-source Qwen model, using samples generated by DeepSeek-R1. The DeepSeek team showed that the reasoning patterns discovered with reinforcement learning in a giant 671 B model can be compressed into tiny dense models without much loss. This 1.5 B checkpoint is the smallest of those distillations.

DeepSeek R1-Distill-Qwen-1.5 B punches way above its weight in math- and code-heavy reasoning while still fitting on a single laptop GPU (~4 GB in 8-bit). Use it whenever you need solid chain-of-thought performance under tight VRAM / latency budgets.

Architecture: 1.78 B-param decoder Transformer (Qwen 2.5-Math-1.5 B base) distilled from the 671 B-param DeepSeek R1 reasoning model

Context length: 32,768 tokens (inherits Qwen 2.5 long-context support)

Related Products

How it works?

Search

Search 25000+ products and services vetted by AWS.

Request private offer

Our team will send you an offer link to view.

Purchase

Accept the offer in your AWS account, and start using the software.

Manage

All your transactions will be consolidated into one bill in AWS.

Create Your Marketplace with Webvar!

Launch your marketplace effortlessly with our solutions. Optimize sales processes and expand your reach with our platform.