Conversational Document Search Using Azure AI Search

Efficient and precise document search capabilities are not just practical and convenient in today’s business world – they are essential for data-dependent organisations.

 

We’re proud to announce that we’ve introduced Document Search in the Observation Deck 4.0 release, enabling users to interact with their data conversationally with natural language, as you can see below:

 

 

This innovative functionality integrates seamlessly with our platform’s existing capabilities, enhancing user interaction by making it more intuitive and efficient. Leveraging AI, the system comprehends queries in natural language, delivering relevant results with citations from documents and data previously selected and uploaded by the user. This enhancement represents a significant leap forward towards making information retrieval as effortless as conversing with an assistant, streamlining the process of uncovering critical data and insights hidden in vast stores of documents.

 

 

Powered by Azure AI Search

 

The backbone of our new Document Search feature is Azure AI Search, a cloud-based search service enabling developers to swiftly set up and deploy sophisticated search experiences across web, mobile, and Azure enterprise applications.

 

It integrates perfectly with other Azure services, offering a robust platform for developing AI-powered search solutions. Unlike traditional search methods, Azure AI Search uses artificial intelligence and machine learning to understand the context and significance of the data, allowing searches that focus more on the meaning behind the words than the words themselves.

 

 

The Anatomy of the AI Search Service

 

This search functionality requires three main elements: Azure AI Search, a data container storing the actual documents and data, and the app from which the user queries the data:

 

Anatomy of the AI Search Service

 

In addition to this architecture, we also decided to integrate a conversational LLM model from OpenAI within the app functionality, further enhancing the effectiveness and naturalness of the conversation driven by Azure AI Search.

 

Search Service

Azure AI Search, formerly known as “Azure Cognitive Search”, is a powerful search engine that supports vector, full text, and hybrid searches. It features rich indexing with integrated data chunking and vectorisation, as well as a robust query syntax for precise information retrieval. Flawlessly integrated with Azure’s extensive infrastructure, Azure AI Search harnesses the platform’s scalability, security, and data management services. Optimised by machine learning and AI capabilities, it supports semantic ranking and integrates with other Azure services for data ingestion and AI enrichment, offering a comprehensive solution for reliable, fast information retrieval.

 

Within this service, an indexer connects to and retrieves data from the configured external data sources where users have stored their documents and data; it then processes and loads this data into an index. Indexes are where the data is stored and organised in a searchable format. Essentially, indexers automate the data ingestion process, while indexes serve as the structured databases that users query against to find relevant information.

 

We can configure indexers thoroughly to control the frequency of data updates, to define custom field mappings between the source data and the index structure, and to apply data transformations or enrichments during the indexing process through cognitive skills. We configured the skillset for our document search functionality to work with OpenAI embeddings, meaning that texts are chunked and vectorised using an embedding model from OpenAI, in this case, Ada-2.

 

Using an embedding model for document search offers a more nuanced understanding of the content beyond simple keyword matching. It translates documents and queries into vector spaces, where the system measures their semantic similarity accurately. This approach enables the retrieval of documents that are contextually relevant to the query, even if they don’t contain the exact query terms, leading to more accurate and meaningful search results.

 

 

The Data Container

 

The data container is a secure storage area where users can upload and manage their documents and data. This centralised repository ensures that all necessary information is readily accessible for indexing and searching, as well as supporting a variety of document formats (.pdf, .txt, .md, .csv) and maintaining data integrity. As mentioned before, it is primarily configured for our indexer skillset, which optimises data to enhance the functionality and value of our search service chatbot.

We can add multiple cognitive skills to our indexer to increase the value of the search. For example, by adding an OCR (Optical Character Recognition) cognitive skill, we can also search graphical documents. These skillsets and other exciting features are explained in detail at this link.

 

 

The App

 

The app, a user-friendly interface within Observation Deck, enables users to conduct searches, discuss results, and view additional information such as actual citations from their documents. It’s designed for ease of use, allowing non-technical users to effectively query their data using natural language, to customise their search experience, and to navigate swiftly through results to find the necessary information.

 

App results on Observation Deck

 

 

Further Optimisations

 

Searching an index using Azure AI Search is composed of 2 execution layers:

  • Retrieval – Layer 1 (L1) quickly retrieves documents using either keyword search, vector search, or a hybrid method that combines both, producing about the top 50 documents to feed into the next layer.
  • Ranking – The second layer (L2) then refines these results using deep learning models for semantic ranking, ensuring the top results are most relevant.

 

This Microsoft blog post details a performance study on different index searches, and concludes that the best method to get accurate results is hybrid retrieval and semantic ranking.

 

Hybrid retrieval combines traditional keyword and vector-based search to locate relevant documents efficiently. At the same time, semantic ranking uses advanced language models to refine and prioritise these search results by relevance.

 

This method leverages the precision of keyword search to capture specific terms, whilst vector search semantically aligns with the query’s intent, even across languages. Tests across various customer and academic datasets confirm that hybrid retrieval and semantic ranking significantly outperform other methods, leading to more accurate and valuable results for end-users, enhancing the relevance of search results and optimising the generative AI’s performance by grounding it in the most contextually appropriate content.

 

 

Conclusions

 

Benefits of Azure AI Search

  • Semantic search: This search configuration and setting, among others, provides results closely aligned with the user’s search intent.
  • Comprehensive indexing: Capable of indexing various document formats, facilitating the location of information across different data types.
  • Scalability: Built to scale, ensuring it can handle growing data volumes seamlessly.
  • Security and compliance: Azure’s robust security framework protects sensitive data while complying with global regulations.

 

How Document Search Changes the Game

Integrating Azure AI Search into Observation Deck transforms how businesses access and use their data:

  • Faster decision-making: Reduces the time spent searching for information, meaning quicker, more informed decisions.
  • Discovery of insights: AI-driven search reveals valuable insights that might otherwise be missed, enabling a deeper understanding of business and market dynamics.
  • Customisable searches: Users can customise their data, search settings and AI model settings to ensure that results meet their specific information needs and business objectives.

 

If you want to see how Azure AI Search can help your organisation save time reading documents and extract the value of your data, or to understand how Observation Deck can bring executive insights to your fingertips, simply contact us and our team of experts will be happy to help you!

Eric M
eric.macia@clearpeaks.com