Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Search API #3049

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

MohammedNagdy
Copy link

Why are these changes needed?

Bing is great in retrieving results. However, variety of search engines can be advantageous to other AI developers .

Related issue number

There is no issue related it is a new feature.

Example of Using Websurfer Agent with Google or Bing

Google

  • You can get the Google API key and the Google CX from the Google Search API. They are essential for the google search to work.
# Google
web_surfer = WebSurferAgent(
    name="Websurfer",
    llm_config=llm_configurations,
    browser_name="google",
    browser_config={"viewport_size": 4096, "api_key": os.environ["GOOGLE_API_KEY"], "cx": os.environ["GOOGLE_CX"]}
)

Bing

# Bing
web_surfer = WebSurferAgent(
    name="Websurfer",
    llm_config=llm_configurations,
    browser_name="bing",
    browser_config={"viewport_size": 4096, "api_key": "1111"]}
)

Checks

@MohammedNagdy
Copy link
Author

MohammedNagdy commented Jul 1, 2024 via email

name: Yifan Zeng
title: PhD student at Oregon State University
url: https://xhmy.github.io/
sonichi:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a problem with CRLF and LR format for file ending with this file.

@sonichi sonichi requested a review from joshkyh July 4, 2024 15:10
@sonichi
Copy link
Collaborator

sonichi commented Jul 4, 2024

@MohammedNagdy Could you please fix the conflict with main? Thanks.

@MohammedNagdy
Copy link
Author

@sonichi I fixed the conflict with the main branch. Could you take a look again?

Copy link

gitguardian bot commented Jul 20, 2024

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@@ -40,6 +43,7 @@ def __init__(
llm_config: Optional[Union[Dict, Literal[False]]] = None,
summarizer_llm_config: Optional[Union[Dict, Literal[False]]] = None,
default_auto_reply: Optional[Union[str, Dict, None]] = "",
browser_name: str = "bing",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use enum for this lookup table?

from ...code_utils import content_str
from ...oai.openai_utils import filter_config
from ...token_count_utils import count_token, get_max_token_limit

logger = logging.getLogger(__name__)

BROWSERS = {"google": GoogleTextBrowser, "bing": BingTextBrowser}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here enums too, or at least some configuration so that users can register new text browsers if they implement new ones (like wikimedia servers for example) maybe a better idea is to pass the textbroser instance to the agent factory itself

pass


class SimpleTextBrowser:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have a better name to highlight that this is the class to be derived for creating other TextBrowsers, TextBroswerBase ? Also do we want to have a class with virtual methods to override or a pure abstract and then SimpleTextBrowser is just an implementation of it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colombod I believe a base class would suffice and we can override the methods we need from there. More than this is honestly over engineering and will add unnecessary complexity. Let me know your opinion.

def _fetch_page(self, url: str) -> None:
try:
# Prepare the request parameters
request_kwargs = self.request_kwargs.copy() if self.request_kwargs is not None else {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered the benefits of using Playwright as opposed to a get request? A lot of pages will need dom composition and js execution to fully compose a usable page. Llamaindex web laoder is doing the same. That will make a much more reliable user experience and will meet expectations for we scraping tasks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not thought about using a third party library. As daunting as it might sound, keeping it simple and as it was was my first approach. However, we can go with either Playwright or Llamaindex web loader both are fine. But then we will need to make sure that the whole library is implementing the requests the same way. Let me know your thoughts. @colombod

Copy link
Collaborator

@colombod colombod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments and would like to discuss those further

@MohammedNagdy
Copy link
Author

@colombod I added some changes let me know what do you think?

}

@classmethod
def create_browser(cls, name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change!

def create_browser(cls, name: str):
"""Factory method to create a text browser instance based on the name."""
if name in cls.browser_classes:
return cls.browser_classes[name]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we need doing this look up we could really just use enums : https://docs.python.org/3/library/enum.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants