Skip to content

fix(core): process all URLs in web_fetch instead of only the first#22212

Open
bdmorgan wants to merge 1 commit intomainfrom
fix/web-fetch-process-all-urls
Open

fix(core): process all URLs in web_fetch instead of only the first#22212
bdmorgan wants to merge 1 commit intomainfrom
fix/web-fetch-process-all-urls

Conversation

@bdmorgan
Copy link
Copy Markdown
Collaborator

Summary

  • The web_fetch tool accepts up to 20 URLs but only processed urls[0] in both execute() and executeFallback() paths
  • Refactored executeFallback() to iterate all valid URLs via a new executeFallbackForUrl() helper
  • Updated execute() to rate-limit-check and validate (private IP) all URLs, not just the first
  • Each URL now receives a fair share of the content budget (MAX_CONTENT_LENGTH / urls.length) rather than the full limit
  • Abort signal is now propagated to retry logic in fallback mode

Changes

  • packages/core/src/tools/web-fetch.ts
    • Extracted single-URL fetch logic into executeFallbackForUrl(url, perUrlContentBudget, signal)
    • executeFallback() now iterates all URLs, collects content from each, and sends combined content to the fallback LLM
    • execute() rate-limit and private-IP checks now iterate all URLs
    • Partial failures are tolerated: if some URLs fail but others succeed, the successful content is still processed

Test plan

  • Pre-commit hooks (lint + prettier) passing
  • Preflight (npm run preflight) passing
  • All 52 web-fetch tests passing
  • Manual test with multi-URL prompts to verify all URLs are fetched

The web_fetch tool accepts up to 20 URLs but only processed urls[0]
in both execute and fallback paths. Now iterates all URLs for
rate-limit checks and private IP validation in execute(), and fetches
all URLs in fallback mode via a new executeFallbackForUrl() helper.

Each URL receives a fair share of the content budget
(MAX_CONTENT_LENGTH / urls.length) rather than the full limit.
Abort signal is now propagated to retry logic in fallback mode.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the web_fetch tool's capability to handle multiple URLs by ensuring that all provided URLs are processed, rather than just the initial one. It introduces robust error handling, distributes content budget fairly across URLs, and extends critical security and performance checks like rate limiting and private IP validation to cover all requests. This improves the tool's reliability and utility when dealing with multi-source web content.

Highlights

  • Multi-URL Processing: The web_fetch tool now correctly processes all provided URLs (up to 20) instead of only the first one, addressing a previous limitation.
  • Fallback Refactoring: The executeFallback() method was refactored to iterate through all valid URLs, collecting content from each and combining it for the fallback LLM.
  • Comprehensive Checks: Rate limiting and private IP validation are now applied to all URLs in the execute() path, enhancing security and preventing abuse.
  • Content Budget Distribution: The maximum content length is now divided among all fetched URLs, ensuring fair allocation and preventing a single large URL from consuming the entire budget.
  • Abort Signal Propagation: The abort signal is now correctly propagated to the retry logic within the fallback mechanism, allowing for proper cancellation.
  • Partial Failure Tolerance: The fallback mechanism now tolerates partial failures, processing content from successfully fetched URLs even if others fail, improving robustness.
Changelog
  • packages/core/src/tools/web-fetch.ts
    • Extracted single-URL fetching logic into a new private helper method executeFallbackForUrl.
    • Modified executeFallback to iterate over all valid URLs, calling executeFallbackForUrl for each, and aggregating their content.
    • Implemented a per-URL content budget for executeFallbackForUrl based on the total MAX_CONTENT_LENGTH and the number of URLs.
    • Updated the retryWithBackoff call in executeFallbackForUrl to propagate the signal parameter.
    • Adjusted the fallbackPrompt to include content from multiple URLs and report errors for failed fetches.
    • Updated the returnDisplay message in executeFallback to list all successfully fetched URLs.
    • Modified the error message in executeFallback to be generic for multiple URLs.
    • Changed execute to loop through all URLs for rate limit checks.
    • Modified execute to check for private IPs across all URLs using urls.some().
Activity
  • Pre-commit hooks (lint + prettier) passing.
  • Preflight (npm run preflight) passing.
  • All 52 web-fetch tests passing.
  • Manual test with multi-URL prompts to verify all URLs are fetched is pending.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the web_fetch tool to correctly handle multiple URLs, iterating over all of them for rate limiting and private IP checks, and dividing the content budget. However, a high-severity indirect prompt injection vulnerability was identified in the executeFallback method, where untrusted content from fetched URLs is directly concatenated into a prompt for the fallback LLM, potentially allowing an attacker to manipulate its behavior. This issue aligns with the rule to avoid including untrusted input in LLM content. Additionally, consider improving performance by fetching URLs in parallel instead of sequentially to enhance responsiveness when multiple URLs are provided.

Comment on lines 317 to 325
const fallbackPrompt = `The user requested the following: "${this.params.prompt}".

I was unable to access the URL directly. Instead, I have fetched the raw content of the page. Please use the following content to answer the request. Do not attempt to access the URL again.
I was unable to access the URL(s) directly. Instead, I have fetched the raw content. Please use the following content to answer the request. Do not attempt to access the URLs again.

---
${textContent}
---
`;
${contentParts.join('\n\n')}
${errors.length > 0 ? `\nNote: Some URLs could not be fetched: ${errors.join('; ')}` : ''}`;
const result = await geminiClient.generateContent(
{ model: 'web-fetch-fallback' },
[{ role: 'user', parts: [{ text: fallbackPrompt }] }],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The executeFallback method is vulnerable to indirect prompt injection. It constructs a prompt for the LLM by concatenating untrusted data—specifically the content fetched from external URLs and error messages from failed fetch attempts—directly into the prompt string. An attacker who controls the content of a fetched URL or can manipulate the HTTP response (e.g., the status text) can inject malicious instructions that the LLM might follow. This could lead to the LLM outputting misleading information, performing unauthorized actions if the output is used by other tools, or exfiltrating data.

To remediate this, consider the following:

  1. Use Structured Delimiters: Wrap untrusted content in clear, hard-to-spoof delimiters and instruct the LLM to treat everything within those delimiters as data, not instructions.
  2. Sanitize Input: Sanitize the fetched content and error messages to remove or escape potential injection sequences.
  3. Constrain the LLM: Use a separate, highly-constrained LLM call to summarize or extract information from the untrusted content before including it in the main prompt.
  4. Escape User Input: Ensure that this.params.prompt is properly escaped when included in the fallbackPrompt to prevent direct injection if the user prompt contains quotes.
References
  1. To prevent prompt injection, avoid including user-provided input in content passed to the LLM (llmContent). This principle extends to any untrusted external data, which should be handled with returnDisplay if needed for display, or sanitized/constrained if used in prompts.

Comment on lines +285 to +301
for (const rawUrl of urls) {
try {
const textContent = await this.executeFallbackForUrl(
rawUrl,
perUrlContentBudget,
signal,
);
contentParts.push(
`--- Content from ${rawUrl} ---\n${textContent}\n---`,
);
fetchedUrls.push(rawUrl);
} catch (e) {
// eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion
const error = e as Error;
errors.push(`Error fetching ${rawUrl}: ${error.message}`);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the logic to handle multiple URLs in the fallback is correct, fetching them sequentially in a for...of loop can be inefficient and slow, especially when many URLs are provided. To improve performance, these independent network requests should be executed in parallel.

You can use Promise.allSettled to fire off all fetch requests concurrently and then process the results, which aligns well with the existing logic for handling both successful fetches and errors.

    const fetchPromises = urls.map((rawUrl) =>
      this.executeFallbackForUrl(rawUrl, perUrlContentBudget, signal),
    );

    const results = await Promise.allSettled(fetchPromises);

    results.forEach((result, index) => {
      const rawUrl = urls[index];
      if (result.status === 'fulfilled') {
        contentParts.push(
          `--- Content from ${rawUrl} ---\n${result.value}\n---`,
        );
        fetchedUrls.push(rawUrl);
      } else {
        // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion
        const error = result.reason as Error;
        errors.push(`Error fetching ${rawUrl}: ${error.message}`);
      }
    });

@github-actions
Copy link
Copy Markdown

Size Change: +1.26 kB (0%)

Total Size: 26.6 MB

Filename Size Change
./bundle/gemini.js 26.1 MB +1.26 kB (0%)
ℹ️ View Unchanged
Filename Size
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 221 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 227 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 11.5 kB
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B
./bundle/sandbox-macos-permissive-open.sb 890 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB
./bundle/sandbox-macos-strict-open.sb 4.82 kB
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB

compressed-size-action

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant