Docs
Tool Development
Browser Tool

BrowserTool

Use BrowserTool class to create a new tool that interacts with web browsers. This tool can be used to automate web browsing tasks, such as opening a webpage, clicking on elements, and extracting information from web pages.

from npiai import BrowserTool
 
class MyTool(BrowserTool):
    def __init__(self):
        super().__init__(
            name='<Provide the name of your tool here>',
            description='<Provide a brief description of your tool here>',
            system_prompt='<Provide a system prompt for LLMs to interact with your tool here>',
        )

Attributes

name

  • Description: The name of the tool.
  • Returns: str

description

  • Description: A brief description of the tool.
  • Returns: str

system_prompt

  • Description: A system prompt for LLMs to interact with the tool.
  • Returns: str

tools

  • Description: A list containing the JSON schema of the functions registered with the tool.
  • Returns: List[ChatCompletionToolParam]. See the Schema Generation section for more information.

Sync Methods

add_tool

  • Usage:

    tool.add_tool(tool_a, tool_b, agent_a, agent_b)
  • Description: Adds tools or agents to the tool.

  • Arguments:

    • *tools (BaseTool): The tools to add. They can be either a Tool or an Agent returns by the npiai.agent.wrap function.
  • Returns: None

use_hitl

  • Usage:

    tool.use_hitl(hitl_handler)
  • Description: Attach the given HITL(human-in-the-loop) handler to this tool and all its sub-tools.

  • Arguments:

    • hitl (HITLHandler): The HITL handler to attach.
  • Returns: None

Async Methods

start

  • Usage:

    await tool.start()
  • Description: Starts the tool.

  • Arguments: None

  • Returns: None

end

  • Usage:

    await tool.end()
  • Description: Ends the tool.

  • Arguments: None

  • Returns: None

call

  • Usage:

    results = await tool.call(llm_response.tool_calls)
  • Description: Calls the function corresponding to the LLM tool calls and returns the results.

  • Arguments:

    • tool_calls (List[ChatCompletionMessageToolCall]): The tool calls from the LLM response.
  • Returns: List[ChatCompletionMessageToolCall]

    • role: "tool"
    • tool_call_id (str): The ID of the tool call.
    • content (str): The result of the corresponding function call.

Browser-related Methods

The BrowserTool class provides several methods to interact with web browsers. Here are some of the key methods:

get_text

  • Usage:

    text = await tool.get_text()
  • Description: Extracts the text content of the current web page as markdown.

  • Arguments: None

  • Returns: str

goto_blank

  • Usage:

    await tool.goto_blank()
  • Description: Go to about:blank page. This can be used as a cleanup function when a session finishes.

  • Arguments: None

  • Returns: None

get_screenshot

  • Usage:

    screenshot = await tool.get_screenshot()
  • Description: Takes a screenshot of the current web page.

  • Arguments: None

  • Returns: str | None: The base64-encoded image data or None if the screenshot fails.

get_page_url

  • Usage:

    url = await tool.get_page_url()
  • Description: Gets the URL of the current web page.

  • Arguments: None

  • Returns: str

get_page_title

  • Usage:

    title = await tool.get_page_title()
  • Description: Gets the title of the current web page.

  • Arguments: None

  • Returns: str

is_scrollable

  • Usage:

    scrollable = await tool.is_scrollable()
  • Description: Checks if the current web page is scrollable.

  • Arguments: None

  • Returns: bool

click

  • Usage:

    await tool.click(element_handle)
  • Description: Clicks on the specified element on the web page.

  • Arguments:

  • Returns: None

fill

  • Usage:

    await tool.fill(element_handle, text)
  • Description: Fills the specified text into the input field of the element on the web page.

  • Arguments:

  • Returns: None

select

  • Usage:

    await tool.select(element_handle, value)
  • Description: Selects the specified value from the dropdown list element on the web page.

  • Arguments:

  • Returns: None

enter

  • Usage:

    await tool.enter(element_handle)
  • Description: Presses the Enter key on the specified element on the web page.

  • Arguments:

  • Returns: None

scroll

  • Usage:

    await tool.scroll()
  • Description: Scrolls the web page down by one viewport height to reveal more content.

  • Arguments: None

  • Returns: None

back_to_top

  • Usage:

    await tool.back_to_top()
  • Description: Scrolls the web page back to the top.

  • Arguments: None

  • Returns: None