The developer platform that revolutionizes inference at scale
Built for developers by developers
If you're investing in a premium service, it should unlock faster growth for you, not slow you down with unexpected limits and hidden fees.
It's time for a smarter solution.
Sub-second latency for live demands
Low-cost for flexible timing, one-off requests
Low-cost for high-volume, bulk processing
from openai import OpenAI # OpenAI compatible API client = OpenAI( base_url="https://api.acloudapp.ai/v1", api_key="my_acloudapp_api_key" ) response = client.chat.completions.create( model="acloudapp/Meta-Llama-3.1-405B-Instruct-Turbo", messages=[ { "role": "user", "content": "Provide an analysis of market trends in AI." } ] ) print(response.choices[0].message.content)
from openai import OpenAI # OpenAI compatible API client = OpenAI( base_url="https://api.acloudapp.ai/v1", api_key="my_acloudapp_api_key" ) response = client.chat.completions.create( model="acloudapp/Meta-Llama-3.1-405B-Instruct-Turbo", messages=[{"role": "user", "content": "Analyze market trends."}], metadata={ "@acloudapp.ai": { "callback_url": "https://my-webhook-receiver/callback", "async": True, "completion_window": "24h" } } )
from openai import OpenAI # OpenAI compatible API client = OpenAI( base_url="https://api.acloudapp.ai/v1", api_key="your_acloudapp_api_key" ) # Upload LLM requests batch_input_file = client.files.create( file=open("batch_llm_requests.json", "rb"), purpose="batch" ) # Start adaptive inference batch_request = client.batches.create( input_file_id=batch_input_file.id, endpoint="v1/chat/completions", completion_window="24h" )
Prices are per million input/output tokens
MODEL SIZE | Real time | 1 hour | 3 hours | 6 hours | 12 hours | 24 hours |
---|---|---|---|---|---|---|
8B | $0.18 | $0.50 | $0.58 | $0.57 | $0.56 | $0.55 |
70B | $0.70 | $0.59 | $0.33 | $0.30 | $0.28 | $0.25 |
40GB | $3.50 | $0.75 | $0.60 | $0.45 | $0.00 | $0.59 |
Deepseek-R1 | $2.00 | $1.00 | $0.50 | $0.30 | $0.70 | $0.80 |
Same terms and restrictions may apply
Experience seamless scalability and best-in-class rate limits, ensuring uninterrupted performance.
Choose a timeframe that suits your needs—whether it's an hour or a full day, we've got you covered.
Achieve top-tier performance and reliability at half the cost of leading providers.