Using Filterable Fields in Azure AI Search for RAG-based GenAI Applications

Posted on August 1, 2024 by Haritha Thilakarathne

With the huge hype Retrieval Augmentation Generation (RAG) in the GenAI applications, vector databases play a vital role in these applications.

Azure AI Search is a leading cloud-based solution for indexing and querying a variety of data sources and a vector database, which is widely used in production-level applications.

Similar to any cloud resource, Azure AI Search has different pricing tiers and limitations. For instance, if you choose the Standard S1 pricing tier, you can create a maximum of 50 indexes.

Azure AI Search Pricing: https://azure.microsoft.com/en-au/pricing/details/search/

Recently, a use case arose where I had to create more than 50 individual indexes in a single AI Search resource. It’s a use case for a RAG-based application for an e-library, where each book should be indexed and have a logical separation between them.

The easiest way forward would be to create a separate index inside the vector database for each book. This was not feasible since the maximum number of indexes in S1 was 50, more than the number of books in the library. Being cost-conscious, there was no room for going for Standard S2 or a higher pricing tier. The actual storage required for all the indexes was not even 100GB. So, S1 was the choice to go with an approach to having a separation between the embeddings of each book.

My approach was to add an additional metadata field for the index and make it a filterable field. Then, the book’s name can be added as a value, which can then be used to filter the particular book when we query it through the API.

The embedding was done using the ada-002 embedding model and GPT-4o was used as the foundational model within the application.

Let’s walk through how that was done with the aid of the LangChain framework.

01. Import required libraries

We use the LangChain Python library to orchestrate LLM-based operations within the application.

import os
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.document_loaders import DirectoryLoader


from azure.search.documents.indexes.models import (
    SearchableField,
    SearchField,
    SearchFieldDataType,
    SimpleField
)

02. Initiate the embedding model

The ada-002 model has been used as the embedding model of this application. You can use any embedding model of choice here.

embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
    azure_deployment= EMBEDDING_MODEL,
    openai_api_version= AZURE_OPENAI_API_VERSION,
    azure_endpoint= AZURE_OPENAI_ENDPOINT,
    api_key= AZURE_OPENAI_API_KEY,
)
embedding_function = embeddings.embed_query

03. Create the structure of the index within the vector database

We create the structure of the index by configuring an additional metadata field for it. This should be filterable since we will filter the content in the index with the value of it.

index_fields = [
    SimpleField(
        name="id",
        type=SearchFieldDataType.String,
        key=True,
        filterable=True,
    ),
    SearchableField(
        name="content",
        type=SearchFieldDataType.String,
        searchable=True,
    ),
    SearchField(
        name="content_vector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=len(embedding_function("Text")),
        vector_search_profile_name="myHnswProfile",
    ),
    SearchableField(
        name="metadata",
        type=SearchFieldDataType.String,
        searchable=True,
    ),
    # Additional field to store the name of the book
    SearchableField(
        name="book_name",
        type=SearchFieldDataType.String,
        filterable=True,
    )
]

04. Create the Azure AI Search vector database

We create the Azure AI Search vector database with custom field configurations that were initiated before.

vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint= AZURE_SEARCH_ENDPOINT,
    azure_search_key= AZURE_SEARCH_KEY,
    index_name= "index_of_books",
    embedding_function= embedding_function,
    fields = index_fields
)

05. Loading documents from a local directory

Any loader available in LangChain can be used to load the content. In this context, the pages array contains all the pages from a single book.

loader = DirectoryLoader("../", glob="**/*")
pages = loader.load()

06. Add the additional metadata field for each document object

The content of the book is read as LangChain documents. We should add a value for the custom filterable field that we initiated in the structure. We can assign the book’s name for that field by running through a simple loop. The value of it should be changed when the pages of a new book is loaded through the loader.

for page in pages:
    metadata = page.metadata
    metadata["book_name"] = "Oliver_Twist"
    page.metadata = metadata

07. Adding the embeddings to the vector store

The customised content can be now added to the vector storage

vector_store.add_documents(pages)

The Retrieval Process

As mentioned in the use case, the usage of adding a customised filterable field for the index is to retrieve the required documents only when answering a user query. For instance, in this use case if we only want to get answers from the book “Oliver Twist” we should only read the embeddings from that particular book. This can be done using the filter argument when parsing this through the OpenAI API. Here’s the sample body of the JSON request I sent for the API to get the filtered content. The filtration follows the OData $filter syntax.

  {  
  "data_sources": [
    {
      "type": "azure_search",
      "parameters": {
        "filter": "book_name eq 'Oliver_Twist'",
        "endpoint": "https://<SEARCH_RESOURCE>.search.windows.net",
        "key": "<AZURE_SEARCH_KEY>",
        "index_name": "index_of_books",
        "semantic_configuration": "azureml-default",
        "authentication": {
          "type": "system_assigned_managed_identity",
          "key": null
        },
        "embedding_dependency": null,
        "query_type": "vector_simple_hybrid",
        "in_scope": true,
        "role_information": "You are an AI assistant find information from the books in the library.",
        "strictness": 3,
        "top_n_documents": 4,
        "embedding_endpoint" : "<EMBEDDING_MODEL>",
        "embedding_key": "<AZURE_OPENAI_API_KEY>"
      }
    }
  ],
  "messages": [
    {
      "role": "system",
      "content": "You are an AI assistant find information from the books in the library."
    },
    {
      "role": "user",
      "content": "Please provide me with the summary of the book."
    }
  ],
  "deployment": "gpt-4o",
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 800,
  "stop": null,
  "stream": false
}

Note that on line 6, we use the filter field to retrieve only the embeddings with the particular book name. Multiple filterable fields are also possible, and they can be used in complex applications. You should remember that using filterable fields makes the search a bit slow but convenient in the use cases where you need a logical separation and filtration capability for the embeddings within the vector store.

Happy to hear about interesting use cases you came up with similar patterns. 🙂

Empowering Conversations with ChatGPT: The Art of Prompt Engineering

Posted on July 22, 2023 by Haritha Thilakarathne

The release of ChatGPT in November 2022 gave a mega boost for the world of AI. With that, everyone started playing around with Large Language Models (LLMs) and explore the possibilities of them.

While opening up interesting possibilities, new approaches and methodologies for interacting with LLMs came into the play. Prompt engineering is one of the major domains even the general public is interested in right now.

To get myself familiar with the context and not to feel left behind, I followed several online resources including the Prompt Engineering for developers from Deeplearning.ai. Here are some interesting points caught my attention. This brief would be helpful for anyone who’s using ChatGPT or any LLM not only in development scenarios, even for general daily chaos.

What’s actually Prompt Engineering?

Without any hesitation, I asked this question from ChatGPT 😀

Prompt engineering refers to the process of designing and refining prompts to elicit desired responses from a language model like ChatGPT. It involves crafting the initial instruction or query in a way that effectively communicates the desired task or information to the model. Prompt engineering can significantly influence the quality and relevance of the model’s responses.
ChatGPT

In simple terms,

Prompt engineering is the methodology of giving clear instructions to LLMs. It helps the language model to understand the instructions clearly and provide a better precise output.

LLMs in the GPT family (GPT-3, GPT3.5, GPT4 etc.) are trained to predict the next word occurrence of a given output. given that, the instructions we provide to the model should be specific and understandable. There are no hard bound rules for prompting the instructions, but the most precise would be better.

There are two key principles we should keep in mind when prompting.

Write clear and specific instructions.
Give the model time to “think”.

Be clear! Be Precise!

There are many tactics which we can follow in order to make out instructions clear and easy to understand for an LLM.

Use delimiters to indicate distinct parts of the input

Let’s get the example of using a GPT model to summarize a particular text. It’s always better to clearly indicate which text parts of the prompt is the instruction and which is the actual text to be summarized. You can use any delimiter you feel comfortable with. Here I’m using double quotes to determine the text to summarised.

Ask for structured outputs

When it comes to LLM aided application development, we use OpenAI APIs to perform several natural language processing tasks. For an example we can use the GPT models to extract key entities, and the sentiment from a set of product reviews. When using the output of the model in a software system, it’s always easy to get the output from a structured format like JSON object or in HTML.

Here’s a prompt which gets feedback for a hotel as the input and gives a structured JSON output. This comes handy in many analytics scenarios and integrating OpenAI APIs in production environments.

It’s always good to go step by step

Even for humans, it’s better to provide instructions in steps to complete a particular task. It works well with LLMs too. Here’s an example of performing a summarization and two translations for a customer review through a single prompt. Observe the structure of the prompt and how it’s guiding the LLM to the desired output. Make sure you construct your prompt as procedural instructions.

Here’s the output in JSON.

{
  "review": "I recently stayed at Sunny Sands Hotel in Galle, Sri Lanka, and it was amazing! The hotel is right by the beautiful beach, and the rooms were clean and comfy. The staff was friendly, and the food was delicious. I also loved the pool and the convenient location near tourist attractions. Highly recommend it for a memorable stay in Sri Lanka!",
  "English_summary": "A delightful stay at Sunny Sands Hotel in Galle, Sri Lanka, with its beautiful beachfront location, clean and comfortable rooms, friendly staff, delicious food, lovely pool, and convenient proximity to tourist attractions.",
  "French_summary": "Un séjour enchanté à l'hôtel Sunny Sands à Galle, Sri Lanka, avec son emplacement magnifique en bord de plage, ses chambres propres et confortables, son personnel amical, sa délicieuse cuisine, sa belle piscine et sa proximité pratique des attractions touristiques.",
  "Spanish_summary": "Una estancia encantadora en el hotel Sunny Sands en Galle, Sri Lanka, con su hermosa ubicación frente a la playa, habitaciones limpias y cómodas, personal amable, comida deliciosa, hermosa piscina y ubicación conveniente cerca de atracciones turísticas."
}

Hallucinations are one of the major limitations LLMs are having. Hallucination occurs when the model produces coherent-sounding, verbose but inaccurate information due to a lack of understanding of cause and effect. Following proper prompt methodologies and tactics can prevent hallucinations to some extend but not 100%.

It’s always the developer’s responsibility to develop AI systems follow the responsible AI principles and accountable for the process.

Feel free to share your experiences with prompt engineering and how you are using LLMs in your development scenarios.

Happy coding!

Do we really need AI?

Posted on June 24, 2023 by Haritha Thilakarathne

Since the launch of ChatGPT in last November, not only the tech community, but also the general public started peeping into the world of AI. As mentioned in my article “AI summer is Here”, organisations are looking for avenues where they can use the power of AI and advance analytics to empower their business processes and gain competitive advantage.

Though everyone is looking forward for using AI, are we really ready? Are we there?

These are my thoughts on the pathway an organisation may follow to adopt AI in their business processes with a sensible return on investment.

First of all, there’s a key concept we should keep in mind. “AI is not a wizard or a magical thing that can do everything.” It’s a man-made concept build upon mathematics, statistics and computer science which we can use as a toolset for certain tasks.

We want to use AI! We want to do something with it! OR We want to do everything with it!

Hold on… Though there’s a ‘trend’ for AI, you should not jump at it without knowing nothing or without analysing your business use cases thoroughly. you should first identify what value the organization is going to gain after using AI or any advance analytics capability. Most likely you can’t do everything with AI (yet). It’s all about identifying the correct use case and correct approach that aligns with your business process.

Let’s not focus on doing something with AI. Let’s focus on doing the right thing with it.

We have a lot of data! So, we are there, right?

Data is the key asset we have when it comes to any analytical use case. Most of the organizations are collecting data with their processes from day 1. The problem lies with the way data is managed and how they maintain data assets and the platform. Some may have a proper data warehouse or lake house architecture which has been properly managed with CI/CD concepts etc, but some may have spread sheets sitting on a local computer which they called their “data”!

The very first thing an organization should do before moving into advance analytics would be streamlining their data platform. Implementing a proper data warehouse architecture or a data lake architecture which follows something similar to Medallion architecture would be essential before moving into any analytics workloads.

If the organization is growing and having a plan to use machine learning and data science within a broader perspective, it is strongly recommended to enable MLOps capabilities within the organization. It would provide a clear platform for model development, maintenance and monitoring.

Having a lot of data doesn’t mean you are right on track. Clearing out the path and streamlining the data management process is essential.

Do we really need to use AI or advance analytics?

This question is real! I have seen many cases where people tend to invest on advance analytics even before getting their business processes align with modern infrastructure needs. It’s not something that you can just enable by a click of a button. Before investing your time and money for AI, first make sure your IT infrastructure, data platforms, work force and IT processes are up to date and ready to expand for future needs.

For an example, will say you are running a retail business which you are planning to use machine learning to perform sales forecasts. If your daily sales data is still sitting on a local SQL server and that’s the same infrastructure you going to use for predictive analytics, definitely that’s going to be a failure. First make sure your data is secured (maybe on a cloud infrastructure) and ready to expose for an analytical platform without hassling with the daily business operations.

ChatGPT can do everything right?

ChatGPT can chat! 😀

As I stated previously, AI is not a wizard. So as ChatGPT. You can use ChatGPT and it’s underlying OpenAI models mostly for natural language processing based tasks and code generation and understanding (Keep in mind that it’s not going to be 100% accurate). If your use case is related to NLP, then GPT-3 models may be an option.

When to use generative AI?

Variational Autoencoders, Generative Adversarial Networks and many more have risen and continue to advance the field of AI. That has given a huge boost for the domain of generative AI. These models are capable of generating new examples that are like examples from a given dataset.

It is being used in diverse fields such as natural language processing (GPT-3, GPT-4), computer vision (DALL-E), speech recognition and many more.

OpenAI is the leading Generative AI service provider in the domain right now and Microsoft Azure offers Azure OpenAI, which is an enterprise level serving of OpenAI services with additional advantages like security and governance from Azure cloud.

If you are thinking about using generative AI with your business use case, strongly recommend going through the following considerations.

If you have said yes to all of the above, then OpenAI may be the right cognitive service to go with.

If it’s not the case, you have to look at other machine learning paradigms and off-the-shelf ML models like cognitive services which may cater better to the scenario.

Ok! What should be the first step?

Take a deep dive for your business processes and identify the gaps in digital transformation. After addressing those ground level issues, then look on the data landscape and analyse the possibilities of performing analytical tasks with the it. Start with a small non-critical use case and then move for the complex scenarios.

If you have any specific use cases in mind, and want to see how AI/ machine learning can help to optimize those processes, I’m more than happy to have a discussion with you.

btw, the image on the top is generated by DALL-E with the prompt of “A cute image of a robot toy sitting near a lake – digital art“

Unlocking the Power of Language with GPT-3

Posted on January 31, 2023 by Haritha Thilakarathne

Yes. This is all about the hype of ChatGPT. It’s obvious that most of us are too obsessed with it and spending a lot of time with that amazing tool even as the regular search engine! (Is that a bye-bye google? 😀 )

I thought of discussing the usage of underlying mechanics of ChatGPT: Large Language Models (LLMs) and the applicability of these giants in intelligent application development.

What actually ChatGPT is?

ChatGPT is a conversational AI model developed by OpenAI. It uses the GPT-3 architecture, which is based on Transformer neural networks. GPT-3 is large language model with about 175 billion parameters. The model has been trained on a huge corpus of text data (about 45TB) to generate human-like responses to text inputs. Most of the data used in training is harvested from public internet. ChatGPT can perform a variety of language tasks such as answering questions, generating text, translating languages, and more.

ChatGPT is only a single use case of a massive research. The underlying power is the ANN architecture GPT-3. Let’s dig down step by step while discussing following pinpoints.

What are Large Language Models (LLMs)?

LLMs are deep learning algorithms that can recognize, summarize, translate, predict and generate text and other content based on knowledge gained from massive datasets. As the name suggests, these language models are trained with massive amounts of textual data using unsupervised learning. (Yes, there’s no data labelling involved with this). BLOOM from Hugging Face, ESMFold from Meta AI, Gato by DeepMind, BERT from Google, MT-NLG from Nvidia & Microsoft, GPT-3 from OpenAI are some of the LLMs in the AI space.

Large language models are among the most successful applications of transformer models. They aren’t just for teaching machines human languages, but for understanding proteins, writing software code and much more.

What are Transformers?

Encoder decoder architecture — The encoder-decoder structure of the Transformer architecture
Taken from “Attention Is All You Need“

Transformers? Are we going to talk about bumblebee here? Actually not!

Transformers are a type of neural network architecture (similar as Convolutional Neural Networks, Recurrent Neural Networks etc.) designed for processing sequential data such as text, speech, or time-series data. They were introduced in the 2017 research paper “Attention is All You Need“. Transformers use self-attention mechanisms to process the input sequence and compute a weighted sum of the features at each position, allowing the model to efficiently process sequences of varying length and capture long-range dependencies. They have been successful in many natural language processing tasks such as machine translation and have become a popular choice in recent years.

For a deep learning enthusiast this may sound familiar with the RNN architecture which are mostly used for learning sequential tasks. Unless the RNNs, transformers are capable of capturing long term dependencies which make them so capable of complex natural language processing tasks.

GPT stands for “Generative Pre-trained Transformer”. As the name implies it’s built with the blessing of transformers.

Alright… now GPT-3 is the hero here! So, what’s cool about GPT-3?

GPT-3 is one successful innovation in the LLMs (It’s not the only LLM in the world)
GPT-3 model itself has no knowledge; its strength lies in its ability to predict the subsequent word(s) in a sequence. It is not intended to store or recall factual information.
As such the model itself has no knowledge, it is just good at predicting the next word(s) in the sequence. It is not designed to store or retrieve facts.
It’s a pretrained machine learning model. You cannot download or retrain the model since it’s massive! (fine-tuning with our own data is possible).
GPT-3 is having a closed-API access which you need an API key to access.
GPT-3 is good mostly for English language tasks.
A bit of downside: the outputs can be biased and abusive – since it’s learning from the data fetched from public internet.

If you are really interested in learning the science behind GPT-3 I would recommend to take a look on the paper : Language Models are Few-Shot Learners

What’s OpenAI?

The 2015 founded research organisation OpenAI is the creators of GPT-3 architecture. GPT-3 is not the only interesting innovation from OpenAI. If you have seen AI generated arts which are created from a natural language phrases as the input, it’s most probably from DALL-E 2 neural network which is also from OpenAI.

OpenAI is having there set of APIs, which can be easily adapted for developers in their intelligent application development tasks.

Check the OpenAI APIs here: https://beta.openai.com/overview

What can be the use cases of GPT-3?

We all know ChatGPT is ground-breaking. Our focus should be exploring the approaches which we can use its underlying architecture (GPT-3) in application development.

Since the beginning of the deep neural networks, there have been lot of research and innovation in the computer vision space. The networks like ResNet were ground-breaking and even surpass the human accuracy level in tasks like image classification with ImageNet dataset. We were getting the advantage of having pre-trained state-of-the-art networks for computer vision tasks without bothering on large training datasets.

The LLMs like GPT-3 is addressing the gap of the lack of such networks in natural language analysis tasks. Simply it’s a massive pre-trained knowledge base that can understand language.

There are many interesting use cases of GPT-3 as a language model in use cases including but not limited to:

Dynamic chatbots for customer service use cases which provide more human-like interaction with users.
Intelligent document management by generating smart tagging/ paraphrasing, summarizing textual documents.
Content generation for websites, new articles, educational materials etc.
Advance textual classification tasks
Sentiment analysis
Semantic search capabilities which provide natural language query capability.
Text translation, keyword identification etc.
Programming code generation and code optimisation

Since the GPT-3 can be fine-tuned with a given set of training data, the possibilities are limitless with the natural language understanding capability it is having. You can be creative and come up with the next big idea which improves the productivity of your business.

What is Azure OpenAI?

Azure OpenAI is a collaboration between Microsoft’s Azure cloud platform and OpenAI, aimed at providing cloud-based access to OpenAI’s cutting-edge AI models and tools. The partnership provides a seamless platform for developers and organizations to build, deploy, and scale AI applications and services, leveraging the computing resources and technology of the Azure cloud.

Users can access the service through REST APIs, Python SDK or through the Azure OpenAI Service Studio, which is the web-based interface dedicated for OpenAI services.

In enterprise application development scenarios, using OpenAI services through Azure makes it much easier for integration.

Azure OpenAI opened for general availability very recently and I’m pretty sure there’ll be vast improvements in the coming days with the product.

Let’s keep our eyes open and start innovating on ways which we can use this super-power wisely.