NVIDIA researchers recently introduced ChatQA, a family of conversational question answering (QA) models, aiming to achieve GPT-4 level accuracies.
NVIDIA’s ChatQA introduces a range of models, ranging from 7B to 70B in size. Extensive evaluations across 10 conversational QA datasets reveal that the top-performing ChatQA-70B model not only outperforms GPT-3.5-turbo but also performs at par with GPT-4. What’s notable is that these achievements are attained without relying on any synthetic data from ChatGPT models.
The team behind ChatQA proposes a two-stage instruction tuning method, significantly enhancing zero-shot conversational QA results from large language models (LLMs). To address retrieval in conversational QA, a dense retriever is fine-tuned on a multi-turn QA dataset, delivering comparable results to state-of-the-art query rewriting models with reduced deployment costs.
NVIDIA showcases the effectiveness of fine-tuning a single-turn query retriever using their curated conversational QA data. This approach performs comparably to the state-of-the-art LLM-based query rewriting model, all achieved without the need for extra computational time and potential API costs associated with rewriting.
NVIDIA’s ChatQA has demonstrated a remarkable advancement in handling scenarios where answers are elusive. The incorporation of a small number of “unanswerable” samples has proven to significantly enhance the model’s capabilities. Through an evaluation of unanswerable cases, it is evident that the leading model, ChatQA-70B, exhibits only a slight performance gap when compared to the formidable GPT-4.
For more details, refer to the paper: ChatQA Paper.
NVIDIA is not alone. Several foundational models have achieved the capabilities of GPT-4. Google is likely to launch Gemini Ultra at any moment. Meanwhile, Mistral CEO Arthur Mensch announced on French national radio that the company will unveil an open-source GPT-4-level model in 2024.