Hi, I've created a mistral-large deployment. I want to experiment with
moving my prompts from GPT-4 to Mistral. Currently, we're performing RAG
using the extra_body parameter in client.chat.completions.create with
GPT-4. This works fine for GPT-4, but when moving to a client =
MistralClient() I can ...
@mimillet disable flash attention if your GPU does not support it. Here
model_kwargs = dict( use_cache=False, trust_remote_code=True,
attn_implementation="flash_attention_2", # loading the model with
flash-attenstion support torch_dtype=torch.bfloat16, device_map=None )
Hello @RohanP1810, Yes, you can use Mistral models with chat completions
on Azure AI services to perform RAG by combining Azure Cognitive Search
for retrieval with Azure OpenAI Service for generation. This approach
should leverage the strengths of both retrieval-based and generative
models to provid...
Latest Comments