Deep engine

更新时间:
复制 MD 格式

This topic describes the features and benchmark results of the Deep engine.

Overview

The Deep engine is a solution designed specifically for agentic search scenarios. It offers the following core advantages:

  • Complex problem handling: The engine understands and decomposes complex queries, using multi-path, multi-step retrieval to gather comprehensive information. It significantly outperforms other search engines on benchmarks like FRAMES and BrowseComp.

  • Broader index data: The index covers both Chinese and English results, delivering excellent performance on datasets in both languages.

  • Rich main text: Search results include a brief snippet (under 500 characters) and a detailed mainText (up to 50,000 characters). This provides richer raw data, leading to better performance on complex tasks.

Performance

We benchmarked the Deep engine against other search engines on public datasets such as SimpleQA, FRAMES, and BrowseComp. The results are as follows:

SimpleQA

SimpleQA is a factual question-answering benchmark from OpenAI that evaluates the accuracy of large language models (LLMs) and retrieval-augmented generation (RAG) systems on short, fact-based questions.

Evaluation method

  • Judge Model: qwen3.5-plus

  • Answer Model: qwen3.5-plus

  • The simplified logic for augmenting context with search results is similar to ChineseSimpleQA (--search-engine is an additional parameter).

  • The evaluation uses a random sample of 1,000 examples.

 python -m simple-evals.simple_evals --model qwen3.5-plus --eval simpleqa --examples 1000 --search-engine Deep

Evaluation results

Search engine

Score

incorrect

not_attempted

LLM Only (no web search)

0.467

0.511

0.022

Deep engine (snippet)

0.876

0.103

0.021

Google (snippet, SERP)

0.818

0.124

0.058

Exa Auto (snippet)

0.619

0.186

0.195

FRAMES

FRAMES (Factuality, Retrieval, And reasoning MEasurement Set) is a standardized evaluation framework from Google Research. It contains 824 challenging multi-hop questions. Each question requires retrieving and combining content from 2 to 15 Wikipedia articles to answer. Unlike simple factual Q&A benchmarks like SimpleQA, FRAMES focuses on complex tasks that require multiple retrievals and information integration, more closely mirroring real-world user queries.

Evaluation method

  • Judge Model: qwen3.5-plus

  • Answer Model: qwen3.5-plus

  • The simplified logic for augmenting context with search results is similar to ChineseSimpleQA (--search-engine is an additional parameter).

  • The evaluation uses the full dataset of 824 examples.

 python -m simple-evals.simple_evals --model qwen3.5-plus --eval frames --search-engine Deep

Evaluation results

Search engine

Score

incorrect

not_attempted

LLM Only (no web search)

0.071

0.544

0.385

Deep engine (mainText)

0.627

0.340

0.033

Google (snippet, SERP)

0.360

0.535

0.104

Exa Deep (mainText)

0.708

0.275

0.017

BrowseComp

BrowseComp is a challenging web browsing benchmark from OpenAI that tests an agent's ability to perform deep information retrieval and multi-step reasoning. With 1,266 tasks, it requires agents to browse, navigate, and reason through multiple steps, simulating complex scenarios far beyond single-turn searches.

Evaluation method

  • Judge Model: qwen3.5-plus

  • Answer Model: qwen3.5-plus

  • A simple ReAct agent is used, with a 10-minute inference time limit and a maximum of 20 rounds of LLM interaction.

  • The evaluation uses a sample of 100 examples.

 python -m simple-evals.simple_evals --model qwen3.5-plus --eval browsecomp --search-engine Deep

Evaluation results

Search engine

Score

incorrect

LLM Only (no web search)

0.04

0.96

Deep engine (mainText)

0.26

0.74

Google (snippet, SERP)

0.14

0.86

Exa Deep (mainText)

0.23

0.77

Features

The engine supports common advanced retrieval parameters and is optimized to be agent-friendly.

Date range search

In addition to the TimeRange parameter, you can use search parameters to specify a publication date range.

"advancedParams": {
  "startPublishedDate": "2024-12-01",
  "endPublishedDate": "2025-01-31"
}

Number of results

You can request 1 to 50 results, depending on your scenario and processing capacity. The default is 10.

"advancedParams": {
  "numResults": "10"
}

Extended main text

Traditional search engines only return search-result pages with snippets relevant to the user's query. In agentic search scenarios, snippets are often insufficient and require further web crawling. The Deep engine returns not only a page snippet but also the full, original main text. This gives the agent ample context to solve complex problems.

Limitations

  1. The Deep engine has a higher latency (around 10 s). It is suitable for latency-insensitive scenarios, such as generating research reports or running offline tasks. Using it in latency-sensitive scenarios, such as real-time conversations, can degrade the user experience.

  2. Search results may occasionally contain limited main text, for example, on multimodal pages (such as video pages) or pages that require a user login. We recommend processing both the page snippet and the main text.

Usage

For detailed instructions, see the engineType = Deep section in the IQS Unified Search API documentation.

Release notes

Release date

Description

May 11, 2026

  • General Availability (GA) of the Deep engine

April 9, 2026

  • Private preview of the Deep engine