Large scale
inference at small
scale cost

The developer platform that revolutionizes inference at scale

Built for developers by developers

We hate rate limits, too.

If you're investing in a premium service, it should unlock faster growth for you, not slow you down with unexpected limits and hidden fees.

It's time for a smarter solution.

Introducing
Adaptive Inference

Real-time

Sub-second latency for live demands

Asynchronous

Low-cost for flexible timing, one-off requests

Batch

Low-cost for high-volume, bulk processing

from openai import OpenAI
# OpenAI compatible API
client = OpenAI(
    base_url="https://api.acloudapp.ai/v1",
    api_key="my_acloudapp_api_key"
)
response = client.chat.completions.create(
    model="acloudapp/Meta-Llama-3.1-405B-Instruct-Turbo",
    messages=[
        {
            "role": "user",
            "content": "Provide an analysis of market trends in AI."
        }
    ]
)
print(response.choices[0].message.content)

from openai import OpenAI
# OpenAI compatible API
client = OpenAI(
    base_url="https://api.acloudapp.ai/v1",
    api_key="my_acloudapp_api_key"
)
response = client.chat.completions.create(
    model="acloudapp/Meta-Llama-3.1-405B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Analyze market trends."}],
    metadata={
        "@acloudapp.ai": {
            "callback_url": "https://my-webhook-receiver/callback",
            "async": True,
            "completion_window": "24h"
        }
    }
)

from openai import OpenAI
# OpenAI compatible API
client = OpenAI(
    base_url="https://api.acloudapp.ai/v1",
    api_key="your_acloudapp_api_key"
)
# Upload LLM requests
batch_input_file = client.files.create(
    file=open("batch_llm_requests.json", "rb"),
    purpose="batch"
)
# Start adaptive inference
batch_request = client.batches.create(
    input_file_id=batch_input_file.id,
    endpoint="v1/chat/completions",
    completion_window="24h"
)

LLAMA 3.3, LLAMA 3.1, and
DEEPSEEK-R1 MODELS

Prices are per million input/output tokens

MODEL SIZE	Real time	1 hour	3 hours	6 hours	12 hours	24 hours
8B	$0.18	$0.50	$0.58	$0.57	$0.56	$0.55
70B	$0.70	$0.59	$0.33	$0.30	$0.28	$0.25
40GB	$3.50	$0.75	$0.60	$0.45	$0.00	$0.59
Deepseek-R1	$2.00	$1.00	$0.50	$0.30	$0.70	$0.80

Same terms and restrictions may apply

Why developers love acloudapp

High volume by design

Experience seamless scalability and best-in-class rate limits, ensuring uninterrupted performance.

Predictable completion windows

Choose a timeframe that suits your needs—whether it's an hour or a full day, we've got you covered.

Unmatched value

Achieve top-tier performance and reliability at half the cost of leading providers.

Large scaleinference at smallscale cost