Mistral AI recently released Mixtral 8x7B, a sparse mixture of experts (SMoE) large language model (LLM). The model contains 46.7B total parameters, but performs inference at the same speed and cost as models one-third that size. On several LLM benchmarks, it outperformed both Llama 2 70B and GPT-3.5, the model powering ChatGPT.
Mistral 8x7B has a context length of 32k tokens and can accept the Spanish, French, Italian, German, and English language. Besides the base Mixtral 8x7B…
Read the full article here