September 2024Running Llama 3.1 70B on a Single Consumer-Grade GPU (RTX 4090 24GB) at 60 Tokens/s
< Homepage