Co-Author: Muhammad Alif Ramadhan & Muhamad Adamy Rayeuk

Retrieval-based methods search through a database of existing text to find relevant information for a given query.

Generative models create new content based on user input using techniques like pre-training and deep learning. By leveraging external knowledge sources, RAG can provide more comprehensive and informative responses.

High level overview of RAG workflow

There are three key components of RAG, which are: embedding model, database, and retrieval relevant context method.

Key Components of RAG

Let’s start with Embedding. Embedding is one of the key building blocks of Large Language Models (LLM). Embeddings are continuous vector representations of words or tokens that capture their semantic meanings in high-dimensional space.

💡
LLM (Large Language Model) is a type of computational model designed for natural language processing tasks such as language generation.
Word2Vec embeddings, Linear relationship (source: tensorflow.org)

For example, in the above picture it is explained that there are words such as man and woman where they are not the same words, but have connotations at a similar level, therefore the vector representation of the two points can be in adjacent vector space, this is what is called semantic retrieval.

Embedding Model

We use embedding models to enable LLM models to comprehend and reason with high-dimensional data. Embedding models are algorithms trained to encapsulate information into dense representations in a multidimensional space.

How embedding works

In other words, embedding models create fixed-length vector representations of text, focusing on semantic meaning for tasks like similarity comparison.

💡
You can find a list of embedding models in the MTEB leaderboard hosted on Hugging Face.

These vector data need to be stored in a database, right? This is where the Vector database came to the rescue.

Database

Based on Oracle, a database is an organized collection of structured information, or data, typically stored electronically in a computer system. We use vector and NoSQL databases to store and retrieve data.

Vector Database

A vector database is a collection of data stored as mathematical representations. Vector databases make it easier for machine learning models to remember previous inputs, allowing machine learning to be used to power search, recommendations, and text generation use-cases.

Representation of vectors stored in a Vector database

Data can be identified based on similarity metrics instead of exact matches, making it possible for a computer model to understand data contextually.

NoSQL Database

While vector databases are optimized for the storage and retrieval of vector data, NoSQL databases are optimized for the storage and retrieval of unstructured data.

💡
Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner.

Unstructured data lacks a strict format or schema, making it challenging for conventional databases to manage. Yet, this unstructured data holds immense potential for AI, machine learning, and modern search engines.

In our use case, we incorporate a wide range of unstructured text data into our knowledge base. This includes documents such as reports, invoices, records, emails, and outputs from various productivity applications.

Retrieval Relevant Context Method

The next challenge is retrieving relevant context-based information in response to user queries.

User prompt, with contextual knowledge.

Now that it has enough knowledge, we want this machine to answer our question based on the knowledge that we provided.

How Retrieval Relevant Context method works

First, we search our NoSQL database for the top x documents most relevant to the user’s prompt.

Then, we convert both the user’s prompt and the top x documents into vector representations.

We use these vectors to search our vector database, and filtering the results to retrieve the most relevant context.

Generating response based on relevant context

After acquiring the relevant context, we’ll incorporate it, along with predefined rules and the user’s specific prompt, into the LLM’s input.

The LLM will then process this information and generate a tailored response.


This article was also published in Bahasa Indonesia in the 4th edition of ITSEC Buzz Magazine. Check it out!

ITSEC BUZZ Magazine - Edisi 4
Edisi keempat majalah ITSEC Buzz telah terbit! Di dalamnya, Anda akan menemukan tujuh artikel menarik yang bisa dinikmati oleh siapa pun, walaupun Anda tidak memiliki latar belakang IT ataupun IT Security. Empat artikel pertama menceritakan bagaimana serangan siber bersinggungan lansung dengan kehidupan manusia sehari-hari. Bahkan kebanyakan dari kita tidak sadar

Hope this helps. See you in the next article! 👋

Share this post