Background
NVIDIA offers free AI model APIs via integrate.api.nvidia.com. But are they reliable during peak hours? I ran two rounds of tests to find out.
Test Method
- 3 calls per model, fixed prompt: “Reply OK only”
- 30-second timeout
- Tested at 2 AM (off-peak) and 5 PM (peak)
Results
| Model | Off-peak | Peak | Avg Response | Verdict |
|---|---|---|---|---|
| mistral-small-4-119b | 3/3 | 2/3 | 0.69s | ⭐ Fastest |
| nemotron-3-super-120b | 3/3 | 3/3 | 10.1s | ✅ Most stable |
| qwen3.5-122b | 3/3 | 2/3 | 6.9s | ✅ Usable |
| kimi-k2.5 | 2/3 | 0/3 | Timeout | ❌ Dead at peak |
| deepseek-v3.2 | 0/3 | 0/3 | Timeout | ❌ Always dead |
| glm-4.7 | 0/3 | 0/3 | 404 | ❌ Endpoint missing |
Conclusion
Only 3 models survive peak hours:
- Mistral Small 4 — 0.69s response, blazing fast
- Nemotron 3 Super — 100% success rate, but slow (7-14s), has reasoning
- Qwen 3.5 — Works but occasionally times out
The other three (Kimi, DeepSeek, GLM) are completely unusable during peak hours.