9 Tips to Design Hallucination-Free RAG/LLM Systems
And in our case (see https://mltblog.com/4fPuvTb), with no training and zero parameter! By zero parameter, I mean no neural network parameters (the typical 40B you see in many LLMs, that stands for 40 billion parameters also called weights). We do indeed have a few intuitive parameters that you can fine-tune in real time.
Tips to make your system hallucination-free:
- We use sub-LLMs specific to each topic (part of a large corpus), thus mixing unrelated items is much less likely to happen.
- In the base version, the output returned is unaltered rather than reworded. The latter can cause hallucinations.
- It shows a high-level structured summary first, with category, tags, agents attached to each item; the user can click on the items he is most interested in based on summary, reducing the risk of misfit.
- The user can specify agents, tags or categories in the UI, it's much more than a prompt box. He can also include negative keywords, joint keywords that must appear jointly in the corpus, put a higher weight on the first keyword in the prompt, or favor the most recent material in the results.
- Python libraries can cause hallucinations. For instance, project and projected have the same stem. We use these libraries but with workarounds to avoid these issues that can lead to hallucinations.
- We return a relevancy score to each item in the prompt results, ranging from 0 to 10. If we cannot find highly relevant information in your augmented corpus, despite using a synonyms dictionary, the score will be low, telling you that the system knows that this particular item is not great. You can choose to no show items with a low score, though sometimes they contain unexpectedly interesting information (the reason to keep them).
- We show links and references, all coming from reliable sources. The user can double-check in case of doubt.
- We suggest alternate keywords to use in your next prompts (related concept)
Blueprint: Next-Gen Enterprise RAG & LLM 2.0 โ Nvidia PDFs Use Case
In my most recent articles and books, I discussed our radically different approach to building enterprise LLMs from scratch, without training, hallucinations, prompt engineering or GPU, while delivering higher accuracy at a much lower cost, safely, at scale and at lightning speed (in-memory). It is also far easier to adapt to specific corpuses and business needs, to fine-tune, and modify, giving you full control over all the components, based on a small number of intuitive parameters and explainable AI.
Now, I assembled everything into a well-structured 9-page document (+ 20 pages of code) with one-click links to the sources including our internal library, deep retrieval PDF parser, real-life input corpus, backend tables, and so on. Access to all this is offered only to those acquiring the paper. Our technology is so different from standard LLMs that we call it LLM 2.0.
This technical paper is much more than a compact version of past documentation. It highlights new features such as un-stemming to boost exhaustivity, multi-index, relevancy score vectors, multi-level chunking, various multi-token types (some originating from the knowledge graph) and how they are leveraged, as well as pre-assigned multimodal agents. I also discuss the advanced UI โ far more than a prompt box โ with unaltered concise structured output, suggested keywords for deeper dive, agent or category selection to increase focus, and relevancy scores. Of special interest: simplified, improved architecture, and upgrade to process word associations in large chunks (embeddings) even faster.