Fireworks AI

InferenceOptimization • US

Total Models242

Free Models228

Paid Models14

Fast inference platform

🌐 Visit Website 💰 Official Pricing 📚 API Documentation

🆓 Free Models (228)

Model	Context	Capabilities
Cogito v1 Preview Llama 70B cogito-v1-preview-llama-70b	-	function_calling
Mistral Large 3 675B Instruct 2512 mistral-large-3-fp8	-	visionfunction_calling
Ministral 3 14B Instruct 2512 ministral-3-14b-instruct-2512	-	visionfunction_calling
Ministral 3 8B Instruct 2512 ministral-3-8b-instruct-2512	-	visionfunction_calling
Ministral 3 3B Instruct 2512 ministral-3-3b-instruct-2512	-	visionfunction_calling
KAT Coder kat-coder	-
FARE-20B fare-20b	-
KAT Dev 32B kat-dev-32b	-
KAT Dev 72B Exp kat-dev-72b-exp	-	function_calling
OpenAI gpt-oss-safeguard-120b gpt-oss-safeguard-120b	-	function_calling
OpenAI gpt-oss-safeguard-20b gpt-oss-safeguard-20b	-	function_calling
NVIDIA Nemotron Nano 2 VL nemotron-nano-v2-12b-vl	-	vision
Qwen 3 4B Instruct 2507 qwen3-4b-instruct-2507	-
NVIDIA Nemotron Nano 9B v2 nvidia-nemotron-nano-9b-v2	-	function_calling
NVIDIA Nemotron Nano 12B v2 nvidia-nemotron-nano-12b-v2	-	function_calling
Qwen3 Next 80B A3B Thinking qwen3-next-80b-a3b-thinking	-
Qwen3 Next 80B A3B Instruct qwen3-next-80b-a3b-instruct	-
Qwen3 Coder 480B Instruct BF16 qwen3-coder-480b-instruct-bf16	-
GLM-4.5V glm-4p5v	-	visionfunction_calling
GLM-4.5-Air glm-4p5-air	-	function_calling
Qwen3 30B A3B Thinking 2507 qwen3-30b-a3b-thinking-2507	-	function_calling
GLM-4.5 glm-4p5	-	function_calling
Qwen3 30B A3B Instruct 2507 qwen3-30b-a3b-instruct-2507	-
Kimi K2 Instruct kimi-k2-instruct	-	function_calling
ERNIE-4.5-300B-A47B-PT ernie-4p5-300b-a47b-pt	-
ERNIE-4.5-21B-A3B-PT ernie-4p5-21b-a3b-pt	-
MiniMax-M1-80k minimax-m1-80k	-
Rolm OCR rolm-ocr	-	vision
InternVL3 78B internvl3-78b	-	vision
InternVL3 38B internvl3-38b	-	vision
InternVL3 8B internvl3-8b	-	vision
DeepSeek R1 0528 Distill Qwen3 8B deepseek-r1-0528-distill-qwen3-8b	-	function_calling
Dobby Mini Unhinged Plus Llama 3.1 8B dobby-mini-unhinged-plus-llama-3-1-8b	-
Devstral-Small-2505 devstral-small-2505	-
Qwen2.5 1.5B Instruct qwen2p5-1p5b-instruct	-
DeepSeek Prover V2 deepseek-prover-v2	-
Qwen3 1.7B qwen3-1p7b	-	function_calling
Qwen3 4B qwen3-4b	-	function_calling
Qwen3 32B qwen3-32b	-	function_calling
Qwen3 0.6B qwen3-0p6b	-	function_calling
Qwen3 14B qwen3-14b	-	function_calling
Qwen3 30B-A3B qwen3-30b-a3b	-	function_calling
Gemma 3 27B Instruct gemma-3-27b-it	-
Cogito v1 Preview Qwen 32B cogito-v1-preview-qwen-32b	-	function_calling
Mixtral MoE 8x7B Instruct mixtral-8x7b-instruct	-
Mixtral MoE 8x22B Instruct mixtral-8x22b-instruct	-	function_calling
Llama 3 70B Instruct llama-v3-70b-instruct	-
Llama 3 8B Instruct llama-v3-8b-instruct	-
Llama Guard v2 8B llama-guard-2-8b	-
Llama 3 8B Instruct (HF version) llama-v3-8b-instruct-hf	-
Llama 3 70B Instruct (HF version) llama-v3-70b-instruct-hf	-
Gemma 2B Instruct gemma-2b-it	-
Phi-3 Mini 128k Instruct phi-3-mini-128k-instruct	-
Phi-3.5 Vision Instruct phi-3-vision-128k-instruct	-	vision
Mistral 7B Instruct v0.3 mistral-7b-instruct-v3	-	function_calling
Qwen2 72B Instruct qwen2-72b-instruct	-
Qwen2 7B Instruct qwen2-7b-instruct	-
Dolphin 2.9.2 Qwen2 72B dolphin-2-9-2-qwen2-72b	-
DeepSeek Coder 1.3B Base deepseek-coder-1b-base	-
CodeQwen 1.5 7B code-qwen-1p5-7b	-
CodeGemma 2B codegemma-2b	-
CodeGemma 7B codegemma-7b	-
Gemma 2 9B Instruct gemma2-9b-it	-
DeepSeek Coder V2 Lite Instruct deepseek-coder-v2-lite-instruct	-
DeepSeek Coder V2 Lite Base deepseek-coder-v2-lite-base	-
DeepSeek Coder V2 Instruct deepseek-coder-v2-instruct	-
Llama 3.1 70B Instruct llama-v3p1-70b-instruct	-	function_calling
Mistral Nemo Base 2407 mistral-nemo-base-2407	-
Llama 3.1 405B Instruct llama-v3p1-405b-instruct	-	function_calling
Mistral Nemo Instruct 2407 mistral-nemo-instruct-2407	-
Llama 3.1 8B Instruct llama-v3p1-8b-instruct	-
FireFunction V2 firefunction-v2	-	function_calling
Llama 3.1 405B Instruct Long llama-v3p1-405b-instruct-long	-
DeepSeek V2.5 deepseek-v2p5	-
Llama 3.2 1B Instruct llama-v3p2-1b-instruct	-
Llama 3.2 3B Instruct llama-v3p2-3b-instruct	-
Qwen2.5 7B qwen-v2p5-7b	-
Qwen2.5 14B Instruct qwen-v2p5-14b-instruct	-
Llama 3.2 90B Vision Instruct llama-v3p2-90b-vision-instruct	-	vision
Llama 3.2 11B Vision Instruct llama-v3p2-11b-vision-instruct	-	vision
Llama 3.2 1B llama-v3p2-1b	-
Llama 3.2 3B llama-v3p2-3b	-
Llama Guard v3 1B llama-guard-3-1b	-
DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b	-
DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b	-
DeepSeek R1 Distill Qwen 1.5B deepseek-r1-distill-qwen-1p5b	-
DeepSeek R1 Distill Qwen 7B deepseek-r1-distill-qwen-7b	-
DeepSeek R1 Distill Llama 8B deepseek-r1-distill-llama-8b	-
DeepSeek R1 Distill Qwen 14B deepseek-r1-distill-qwen-14b	-
Mistral Small 24B Instruct 2501 mistral-small-24b-instruct-2501	-
Mixtral 8x7B mixtral-8x7b	32K	chatfunction_calling
Dobby-Unhinged-Llama-3.3-70B dobby-unhinged-llama-3-3-70b-new	-
QWQ 32B qwq-32b	-
DeepSeek R1 (Basic) deepseek-r1-basic	-
Mixtral 8x22B mixtral-8x22b	65K	chatfunction_calling
DeepSeek V3 deepseek-v3	64K	chatfunction_calling
Qwen2.5-VL 3B Instruct qwen2p5-vl-3b-instruct	-	vision
Qwen2.5-VL 7B Instruct qwen2p5-vl-7b-instruct	-	vision
Qwen2.5-VL 72B Instruct qwen2p5-vl-72b-instruct	-	vision
Llama 4 Scout Instruct (Basic) llama4-scout-instruct-basic	-	visionfunction_calling
Llama 4 Maverick Instruct (Basic) llama4-maverick-instruct-basic	-	visionfunction_calling
Mistral 7B v0.2 mistral-7b-v0p2	-
DeepSeek Coder 7B Base deepseek-coder-7b-base	-
Hermes 2 Pro Mistral 7B hermes-2-pro-mistral-7b	-	function_calling
OpenChat 3.5 0106 openchat-3p5-0106-7b	-
Pythia 12B pythia-12b	-
Snorkel Mistral PairRM DPO snorkel-mistral-7b-pairrm-dpo	-
Phind CodeLlama 34B v1 phind-code-llama-34b-v1	-
Phind CodeLlama 34B v2 phind-code-llama-34b-v2	-
Nouse Hermes 2 Mixtral 8x7B DPO nous-hermes-2-mixtral-8x7b-dpo	-
Phind CodeLlama 34B Python v1 phind-code-llama-34b-python-v1	-
Nous Hermes Llama2 70B nous-hermes-llama2-70b	-
Gemma 7B gemma-7b	-
Nous Capybara 7B V1.9 nous-capybara-7b-v1p9	-
Code Llama 70B Python code-llama-70b-python	-
Code Llama 70B Instruct code-llama-70b-instruct	-
Code Llama 34B Instruct code-llama-34b-instruct	-
Code Llama 70B code-llama-70b	-
Code Llama 34B code-llama-34b	-
Code Llama 34B Python code-llama-34b-python	-
Code Llama 13B Instruct code-llama-13b-instruct	-
Code Llama 13B Python code-llama-13b-python	-
Code Llama 13B code-llama-13b	-
Code Llama 7B Instruct code-llama-7b-instruct	-
Code Llama 7B code-llama-7b	-
OpenHermes 2.5 Mistral 7B openhermes-2p5-mistral-7b	-
OpenHermes 2 Mistral 7B openhermes-2-mistral-7b	-
Dolphin 2.6 Mixtral 8x7b dolphin-2p6-mixtral-8x7b	-
Nous Hermes Llama2 7B nous-hermes-llama2-7b	-
Toppy M 7B toppy-m-7b	-
Mistral 7B Instruct v0.2 mistral-7b-instruct-v0p2	-
Nous Hermes Llama2 13B nous-hermes-llama2-13b	-
Chronos Hermes 13B v2 chronos-hermes-13b-v2	-
DeepSeek Coder 7B Base v1.5 deepseek-coder-7b-base-v1p5	-
DeepSeek Coder 7B Instruct v1.5 deepseek-coder-7b-instruct-v1p5	-
DeepSeek Coder 33B Instruct deepseek-coder-33b-instruct	-
Qwen1.5 72B Chat qwen1p5-72b-chat	-
Mistral 7B OpenOrca openorca-7b	-
Gemma 7B Instruct gemma-7b-it	-
FireFunction V1 firefunction-v1	-	function_calling
MythoMax L2 13B mythomax-l2-13b	-
Mixtral MoE 8x7B Instruct (HF version) mixtral-8x7b-instruct-hf	-
Llama 2 70B llama-v2-70b	-
Llama Guard 7B llamaguard-7b	-
Cogito v1 Preview Llama 3B cogito-v1-preview-llama-3b	-	function_calling
Cogito v1 Preview Llama 8B cogito-v1-preview-llama-8b	-	function_calling
Zephyr 7B Beta zephyr-7b-beta	-
Mistral 7B mistral-7b	-
Mistal 7B Instruct V0.1 mistral-7b-instruct-4k	-
Llama 2 7B llama-v2-7b	-
Llama 2 7B Chat llama-v2-7b-chat	-
Llama 2 13B llama-v2-13b	-
Llama 2 13B Chat llama-v2-13b-chat	-
Qwen2.5 0.5B Instruct qwen2p5-0p5b-instruct	-
Firesearch OCR V6 firesearch-ocr-v6	-	vision
Qwen2-VL 72B Instruct qwen2-vl-72b-instruct	-	vision
Qwen2-VL 7B Instruct qwen2-vl-7b-instruct	-	vision
Qwen2-VL 2B Instruct qwen2-vl-2b-instruct	-	vision
Qwen QWQ 32B Preview qwen-qwq-32b-preview	-
Llama 3 8B llama-v3-8b	-
Qwen2.5-Coder 32B qwen2p5-coder-32b	-
Qwen2.5-Coder 32B Instruct qwen2p5-coder-32b-instruct	-
Qwen2.5-Coder 32B Instruct 128K qwen2p5-coder-32b-instruct-128k	-
Qwen2.5-Coder 32B Instruct 32K RoPE qwen2p5-coder-32b-instruct-32k-rope	-
Qwen2.5-Coder 32B Instruct 64k qwen2p5-coder-32b-instruct-64k	-
Qwen2.5-Coder 3B qwen2p5-coder-3b	-
Qwen2.5-Coder 0.5B qwen2p5-coder-0p5b	-
Qwen2.5-Coder 14B qwen2p5-coder-14b	-
Qwen2.5-Coder 14B Instruct qwen2p5-coder-14b-instruct	-
Qwen2.5-Coder 3B Instruct qwen2p5-coder-3b-instruct	-
Qwen2.5-Coder 0.5B Instruct qwen2p5-coder-0p5b-instruct	-
Llama Guard 3 8B llama-guard-3-8b	-
DeepSeek V2 Lite Chat deepseek-v2-lite-chat	-
FLUX.1 [schnell] flux-1-schnell	-
Llama 3.1 Nemotron 70B llama-v3p1-nemotron-70b-instruct	-
Qwen2.5-Math 72B Instruct qwen2p5-math-72b-instruct	-
Llama 3.1 70B Instruct 1B llama-v3p1-70b-instruct-1b	-
Qwen2.5 7B Instruct qwen2p5-7b-instruct	-
Qwen2.5 7B qwen2p5-7b	-
Qwen2.5 14B qwen2p5-14b	-
Qwen2.5 14B Instruct qwen2p5-14b-instruct	-
Qwen2.5 32B qwen2p5-32b	-
Qwen2.5 32B Instruct qwen2p5-32b-instruct	-
Qwen2.5 72B qwen2p5-72b	-
Qwen2.5 72B Instruct qwen2p5-72b-instruct	-	function_calling
Qwen2.5-Coder 1.5B Instruct qwen2p5-coder-1p5b-instruct	-
Qwen2.5-Coder 1.5B qwen2p5-coder-1p5b	-
Qwen2.5-Coder 7B qwen2p5-coder-7b	-
Qwen2.5-Coder 7B Instruct qwen2p5-coder-7b-instruct	-
Kimi K2.5 kimi-k2p5	-	visionfunction_calling
MiniMax-M2.1 minimax-m2p1	-	function_calling
GLM-4.7 glm-4p7	-	function_calling
Deepseek v3.2 deepseek-v3p2	-	function_calling
Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking	-	visionfunction_calling
Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct	-	visionfunction_calling
Kimi K2 Instruct 0905 kimi-k2-instruct-0905	-	function_calling
DeepSeek V3.1 deepseek-v3p1	-	function_calling
OpenAI gpt-oss-120b gpt-oss-120b	-	function_calling
OpenAI gpt-oss-20b gpt-oss-20b	-
Qwen3 235B A22B qwen3-235b-a22b	-	function_calling
Kimi K2 Thinking kimi-k2-thinking	-	function_calling
GLM-4.6 glm-4p6	-	function_calling
Qwen3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507	-
Qwen3 Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct	-	function_calling
Qwen3 235B A22B Instruct 2507 qwen3-235b-a22b-instruct-2507	-	function_calling
Llama 3.3 70B Instruct llama-v3p3-70b-instruct	-
Deepseek R1 05/28 deepseek-r1-0528	-	function_calling
Qwen3 Coder 30B A3B Instruct qwen3-coder-30b-a3b-instruct	-
DeepSeek R1 (Fast) deepseek-r1	-
Cogito 671B v2.1 cogito-671b-v2-p1	-
MiniMax-M2 minimax-m2	-	function_calling
Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking	-	visionfunction_calling
Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct	-	visionfunction_calling
DeepSeek V3.1 Terminus deepseek-v3p1-terminus	-	function_calling
Cogito v1 Preview Qwen 14B cogito-v1-preview-qwen-14b	-	function_calling
Qwen2.5-VL 32B Instruct qwen2p5-vl-32b-instruct	-	vision
Deepseek V3 03-24 deepseek-v3-0324	-	function_calling
GLM-4.7 Flash glm-4p7-flash	-
Molmo2-8B molmo2-8b	-	vision
Molmo2-4B molmo2-4b	-	vision
Qwen3 Omni 30B A3B Instruct qwen3-omni-30b-a3b-instruct	-	visionfunction_calling
Gemma 3 12B Instruct gemma-3-12b-it	-
Gemma 3 4B Instruct gemma-3-4b-it	-
Seed OSS 36B Instruct seed-oss-36b-instruct	-	function_calling
Devstral Small 2 24B Instruct 2512 devstral-small-2-24b-instruct-2512	-	visionfunction_calling
NVIDIA Nemotron Nano 3 30B A3B nemotron-nano-3-30b-a3b	-	function_calling
Qwen3 VL 32B Instruct qwen3-vl-32b-instruct	-	vision
Qwen3-VL-8B-Instruct qwen3-vl-8b-instruct	-	vision

💰 Paid Models (14)

Model	Input/1M	Cached in/1M	Output/1M	Context	Capabilities
Qwen3 8B qwen3-8b	$0.100	—	FREE	-	function_calling
Llama 3.1 8B llama-3-1-8b	$0.200	—	$0.200	131K	chatfunction_calling
Qwen3 235B qwen3-235b	$0.220	—	$0.880	256K	chatfunction_calling
Llama 4 Maverick llama-4-maverick	$0.220	—	$0.880	1.0M	chatfunction_calling
Models up to 16B parameters models-up-to-16b-parameters	$0.500	—	$1.00	-
Models up to 16B parameters models-up-to-16b	$0.500	—	$1.00	-
Qwen 3.5 9B qwen-3.5-9b	$0.660	$0.132	$2.00	65K
Llama 3.3 70B llama-3-3-70b	$0.900	—	$0.900	131K	chatfunction_calling
Qwen 3.6 27B qwen-3.6-27b	$1.86	$0.372	$5.59	65K
Models 16.1B - 80B models-16.1b-80b	$3.00	—	$6.00	-
Llama 3.1 405B llama-3-1-405b	$3.00	—	$3.00	131K	chatfunction_calling
Models 80B - 300B (e.g. Qwen3-235B, gpt-oss-120B) models-80b-300b	$6.00	—	$12.00	-
Models >300B (e.g. DeepSeek V3, Kimi K2) models-300b-plus	$10.00	—	$20.00	-
Models >300B (e.g. DeepSeek V3, Kimi K2) models-over-300b	$10.00	—	$20.00	-