Close Menu
Şevket Ayaksız

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    LG’s 27-inch 240Hz OLED gaming monitor drops $400 to $500

    Mayıs 14, 2026

    Tiny Baseus Picogo power bank drops to $20 in clearance deal

    Mayıs 14, 2026

    Microsoft patches 120 security flaws in May Windows updates

    Mayıs 14, 2026
    Facebook X (Twitter) Instagram
    • software
    • Gadgets
    Facebook X (Twitter) Instagram
    Şevket AyaksızŞevket Ayaksız
    Subscribe
    • Home
    • Technology

      HP OmniBook 5 drops to $699 with 16GB RAM and long battery life

      Mayıs 11, 2026

      Anker’s 9-port charging station drops to $34 on Amazon

      Mayıs 11, 2026

      DDR5 counterfeits surge as the RAM shortage worsens

      Mayıs 11, 2026

      Google Maps vs Waze: I Put the Two Best Navigation Apps Head-to-Head — and One Clearly Came Out on Top

      Mayıs 1, 2026

      T-Mobile Bundles Free Hulu and Netflix for 5G Users: Eligibility Explained

      Mayıs 1, 2026
    • Adobe
    • Microsoft
    • java
    • Oracle
    Şevket Ayaksız
    Anasayfa » Raising the Bar: Unveiling RAG for More Accurate and Reliable Large Language Models
    software

    Raising the Bar: Unveiling RAG for More Accurate and Reliable Large Language Models

    By ayaksızOcak 24, 2024Yorum yapılmamış3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    The problems: LLM hallucinations and limited context
    LLMs often take a long time using expensive resources to train, sometimes months of run time using dozens of state-of-the-art server GPUs such as NVIDIA H100s. Keeping the LLMs completely up-to-date by retraining from scratch is a non-starter, although the less-expensive process of fine-tuning the base model on newer data can help.

    Fine-tuning sometimes has its drawbacks, however, as it can reduce functionality present in the base model (such as general-purpose queries handled well in Llama) when adding new functionality by fine-tuning (such as code generation added to Code Llama).

    What happens if you ask an LLM that was trained on data that ended in 2022 about something that occurred in 2023? Two possibilities: It will either realize it doesn’t know, or it won’t. If the former, it will typically tell you about its training data, e.g. “As of my last update in January 2022, I had information on….” If the latter, it will try to give you an answer based on older, similar but irrelevant data, or it might outright make stuff up (hallucinate).

    To avoid triggering LLM hallucinations, it sometimes helps to mention the date of an event or a relevant web URL in your prompt. You can also supply a relevant document, but providing long documents (whether by supplying the text or the URL) works only until the LLM’s context limit is reached, and then it stops reading. By the way, the context limits differ among models: two Claude models offer a 100K token context window, which works out to about 75,000 words, which is much higher than most other LLMs.

    The solution: Ground the LLM with facts
    As you can guess from the title and beginning of this article, one answer to both of these problems is retrieval-augmented generation. At a high level, RAG works by combining an internet or document search with a language model, in ways that get around the issues you would encounter by trying to do the two steps manually, for example the problem of having the output from the search exceed the language model’s context limit.

    The first step in RAG is to use the query for an internet or document or database search, and vectorize the source information into a dense high-dimensional form, typically by generating an embedding vector and storing it in a vector database. This is the retrieval phase.

    Then you can vectorize the query itself and use FAISS or another similarity search, typically using a cosine metric for similarity, against the vector database, and use that to extract the most relevant portions (or top K items) of the source information and present them to the LLM along with the query text. This is the augmentation phase.

    Finally, the LLM, referred to in the original Facebook AI paper as a seq2seq model, generates an answer. This is the generation phase.

    Post Views: 342
    Code technology
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    ayaksız
    • Website

    Related Posts

    Microsoft patches 120 security flaws in May Windows updates

    Mayıs 14, 2026

    Copilot is replacing Edge browser history with AI-generated summaries

    Mayıs 14, 2026

    Gmail’s AI writing tool now mimics your tone and scans more data

    Mayıs 11, 2026
    Add A Comment

    Comments are closed.

    Editors Picks
    8.5

    Apple Planning Big Mac Redesign and Half-Sized Old Mac

    Ocak 5, 2021

    Autonomous Driving Startup Attracts Chinese Investor

    Ocak 5, 2021

    Onboard Cameras Allow Disabled Quadcopters to Fly

    Ocak 5, 2021
    Top Reviews
    9.1

    Review: T-Mobile Winning 5G Race Around the World

    By sevketayaksiz
    8.9

    Samsung Galaxy S21 Ultra Review: the New King of Android Phones

    By sevketayaksiz
    8.9

    Xiaomi Mi 10: New Variant with Snapdragon 870 Review

    By sevketayaksiz
    Advertisement
    Demo
    Şevket Ayaksız
    Facebook X (Twitter) Instagram YouTube
    • Home
    • Adobe
    • microsoft
    • java
    • Oracle
    • Contact
    © 2026 Theme Designed by Şevket Ayaksız.

    Type above and press Enter to search. Press Esc to cancel.