feat(ee): Add a service to fetch website content and prepare a persona of Captain Assistant (#12732)
This PR is the first of many to simplify the process of building an assistant. The new flow will only require the user’s website. We’ll automatically crawl it, identify the business name and what the business does, and then generate a suggested assistant persona, complete with a proposed name and description. This service returns the following. Example: tooljet.com <img width="795" height="217" alt="Screenshot 2025-10-25 at 2 55 04 PM" src="https://github.com/user-attachments/assets/9cb3594a-9c9c-4970-a0a1-4c9c8869c193" /> Example: replit.com <img width="797" height="176" alt="Screenshot 2025-10-25 at 2 56 42 PM" src="https://github.com/user-attachments/assets/6a1b4266-aab6-455f-a5e3-696d3a8243c9" />
This commit is contained in:
@@ -19,6 +19,20 @@ class Captain::Tools::SimplePageCrawlService
|
||||
ReverseMarkdown.convert @doc.at_xpath('//body'), unknown_tags: :bypass, github_flavored: true
|
||||
end
|
||||
|
||||
def meta_description
|
||||
meta_desc = @doc.at_css('meta[name="description"]')
|
||||
return nil unless meta_desc && meta_desc['content']
|
||||
|
||||
meta_desc['content'].strip
|
||||
end
|
||||
|
||||
def favicon_url
|
||||
favicon_link = @doc.at_css('link[rel*="icon"]')
|
||||
return nil unless favicon_link && favicon_link['href']
|
||||
|
||||
resolve_url(favicon_link['href'])
|
||||
end
|
||||
|
||||
private
|
||||
|
||||
def sitemap?
|
||||
@@ -35,4 +49,12 @@ class Captain::Tools::SimplePageCrawlService
|
||||
absolute_url
|
||||
end
|
||||
end
|
||||
|
||||
def resolve_url(url)
|
||||
return url if url.start_with?('http')
|
||||
|
||||
URI.join(@external_link, url).to_s
|
||||
rescue StandardError
|
||||
url
|
||||
end
|
||||
end
|
||||
|
||||
Reference in New Issue
Block a user