Files
leadchat/lib/tasks/search_test_data.rake
Vinay Keerthi 59cbf57e20 feat: Advanced Search Backend (#12917)
## Description

Implements comprehensive search functionality with advanced filtering
capabilities for Chatwoot (Linear: CW-5956).

This PR adds:
1. **Time-based filtering** for contacts and conversations (SQL-based
search)
2. **Advanced message search** with multiple filters
(OpenSearch/Elasticsearch-based)
- **`from` filter**: Filter messages by sender (format: `contact:42` or
`agent:5`)
   - **`inbox_id` filter**: Filter messages by specific inbox
- **Time range filters**: Filter messages using `since` and `until`
parameters (Unix timestamps in seconds)
- **90-day limit enforcement**: Automatically limits searches to the
last 90 days to prevent performance issues

The implementation extends the existing `Enterprise::SearchService`
module for advanced features and adds time filtering to the base
`SearchService` for SQL-based searches.

## API Documentation

### Base URL
All search endpoints follow this pattern:
```
GET /api/v1/accounts/{account_id}/search/{resource}
```

### Authentication
All requests require authentication headers:
```
api_access_token: YOUR_ACCESS_TOKEN
```

---

## 1. Search All Resources

**Endpoint:** `GET /api/v1/accounts/{account_id}/search`

Returns results from all searchable resources (contacts, conversations,
messages, articles).

### Parameters
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `q` | string | Search query | Yes |
| `page` | integer | Page number (15 items per page) | No |
| `since` | integer | Unix timestamp (contacts/conversations only) | No
|
| `until` | integer | Unix timestamp (contacts/conversations only) | No
|

### Example Request
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search?q=customer" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

### Example Response
```json
{
  "payload": {
    "contacts": [...],
    "conversations": [...],
    "messages": [...],
    "articles": [...]
  }
}
```

---

## 2. Search Contacts

**Endpoint:** `GET /api/v1/accounts/{account_id}/search/contacts`

Search contacts by name, email, phone number, or identifier with
optional time filtering.

### Parameters
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `q` | string | Search query | Yes |
| `page` | integer | Page number (15 items per page) | No |
| `since` | integer | Unix timestamp - filter by last_activity_at | No |
| `until` | integer | Unix timestamp - filter by last_activity_at | No |

### Example Requests

**Basic search:**
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/contacts?q=john" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search contacts active in the last 7 days:**
```bash
SINCE=$(date -v-7d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/contacts?q=john&since=${SINCE}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search contacts active between 30 and 7 days ago:**
```bash
SINCE=$(date -v-30d +%s)
UNTIL=$(date -v-7d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/contacts?q=john&since=${SINCE}&until=${UNTIL}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

### Example Response
```json
{
  "payload": {
    "contacts": [
      {
        "id": 42,
        "email": "john@example.com",
        "name": "John Doe",
        "phone_number": "+1234567890",
        "identifier": "user_123",
        "additional_attributes": {},
        "created_at": 1701234567
      }
    ]
  }
}
```

---

## 3. Search Conversations

**Endpoint:** `GET /api/v1/accounts/{account_id}/search/conversations`

Search conversations by display ID, contact name, email, phone number,
or identifier with optional time filtering.

### Parameters
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `q` | string | Search query | Yes |
| `page` | integer | Page number (15 items per page) | No |
| `since` | integer | Unix timestamp - filter by last_activity_at | No |
| `until` | integer | Unix timestamp - filter by last_activity_at | No |

### Example Requests

**Basic search:**
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/conversations?q=billing" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search conversations active in the last 24 hours:**
```bash
SINCE=$(date -v-1d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/conversations?q=billing&since=${SINCE}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search conversations from last month:**
```bash
SINCE=$(date -v-30d +%s)
UNTIL=$(date +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/conversations?q=billing&since=${SINCE}&until=${UNTIL}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

### Example Response
```json
{
  "payload": {
    "conversations": [
      {
        "id": 123,
        "display_id": 45,
        "inbox_id": 1,
        "status": "open",
        "messages": [...],
        "meta": {...}
      }
    ]
  }
}
```

---

## 4. Search Messages (Advanced)

**Endpoint:** `GET /api/v1/accounts/{account_id}/search/messages`

Advanced message search with multiple filters powered by
OpenSearch/Elasticsearch.

### Prerequisites
- OpenSearch/Elasticsearch must be running (`OPENSEARCH_URL` env var
configured)
- Account must have `advanced_search` feature flag enabled
- Messages must be indexed in OpenSearch

### Parameters
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `q` | string | Search query | Yes |
| `page` | integer | Page number (15 items per page) | No |
| `from` | string | Filter by sender: `contact:{id}` or `agent:{id}` |
No |
| `inbox_id` | integer | Filter by specific inbox ID | No |
| `since` | integer | Unix timestamp - searches from this time (max 90
days ago) | No |
| `until` | integer | Unix timestamp - searches until this time | No |

### Important Notes
- **90-Day Limit**: If `since` is not provided, searches default to the
last 90 days
- If `since` exceeds 90 days, returns `422` error: "Search is limited to
the last 90 days"
- All time filters use message `created_at` timestamp

### Example Requests

**Basic message search:**
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search messages from a specific contact:**
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&from=contact:42" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search messages from a specific agent:**
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&from=agent:5" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search messages in a specific inbox:**
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&inbox_id=3" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search messages from the last 7 days:**
```bash
SINCE=$(date -v-7d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&since=${SINCE}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Search messages between specific dates:**
```bash
SINCE=$(date -v-30d +%s)
UNTIL=$(date -v-7d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&since=${SINCE}&until=${UNTIL}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Combine all filters:**
```bash
SINCE=$(date -v-14d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&from=contact:42&inbox_id=3&since=${SINCE}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

**Attempt to search beyond 90 days (returns error):**
```bash
SINCE=$(date -v-120d +%s)
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/messages?q=refund&since=${SINCE}" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

### Example Response (Success)
```json
{
  "payload": {
    "messages": [
      {
        "id": 789,
        "content": "I need a refund for my purchase",
        "message_type": "incoming",
        "created_at": 1701234567,
        "conversation_id": 123,
        "inbox_id": 3,
        "sender": {
          "id": 42,
          "type": "contact"
        }
      }
    ]
  }
}
```

### Example Response (90-day limit exceeded)
```json
{
  "error": "Search is limited to the last 90 days"
}
```
**Status Code:** `422 Unprocessable Entity`

---

## 5. Search Articles

**Endpoint:** `GET /api/v1/accounts/{account_id}/search/articles`

Search help center articles by title or content.

### Parameters
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `q` | string | Search query | Yes |
| `page` | integer | Page number (15 items per page) | No |

### Example Request
```bash
curl -X GET "https://app.chatwoot.com/api/v1/accounts/1/search/articles?q=installation" \
  -H "api_access_token: YOUR_ACCESS_TOKEN"
```

### Example Response
```json
{
  "payload": {
    "articles": [
      {
        "id": 456,
        "title": "Installation Guide",
        "slug": "installation-guide",
        "portal_slug": "help",
        "account_id": 1,
        "category_name": "Getting Started",
        "status": "published",
        "updated_at": 1701234567
      }
    ]
  }
}
```

---

## Technical Implementation

### SQL-Based Search (Contacts, Conversations, Articles)
- Uses PostgreSQL `ILIKE` queries by default
- Optional GIN index support via `search_with_gin` feature flag for
better performance
- Time filtering uses `last_activity_at` for contacts/conversations
- Returns paginated results (15 per page)

### Advanced Search (Messages)
- Powered by OpenSearch/Elasticsearch via Searchkick gem
- Requires `OPENSEARCH_URL` environment variable
- Requires `advanced_search` account feature flag
- Enforces 90-day lookback limit via
`Limits::MESSAGE_SEARCH_TIME_RANGE_LIMIT_DAYS`
- Validates inbox access permissions before filtering
- Returns paginated results (15 per page)

---

## Type of change

- [x] New feature (non-breaking change which adds functionality)
- [x] Enhancement (improves existing functionality)

---

## How Has This Been Tested?

### Unit Tests
- **Contact Search Tests**: 3 new test cases for time filtering
(`since`, `until`, combined)
- **Conversation Search Tests**: 3 new test cases for time filtering
- **Message Search Tests**: 10+ test cases covering:
  - Individual filters (`from`, `inbox_id`, time range)
  - Combined filters
  - Permission validation for inbox access
  - Feature flag checks
  - 90-day limit enforcement
  - Error handling for exceeded time limits

### Test Commands
```bash
# Run all search controller tests
bundle exec rspec spec/controllers/api/v1/accounts/search_controller_spec.rb

# Run search service tests (includes enterprise specs)
bundle exec rspec spec/services/search_service_spec.rb
```

### Manual Testing Setup
A rake task is provided to create 50,000 test messages across multiple
inboxes:

```bash
# 1. Create test data
bundle exec rake search:setup_test_data

# 2. Start OpenSearch
mise elasticsearch-start

# 3. Reindex messages
rails runner "Message.search_index.import Message.all"

# 4. Enable feature flag
rails runner "Account.first.enable_features('advanced_search')"

# 5. Test via API or Rails console
```

---

## Checklist

- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have commented on my code, particularly in hard-to-understand
areas
- [x] I have made corresponding changes to the documentation (this PR
description)
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream
modules

---

## Additional Notes

### Requirements
- **OpenSearch/Elasticsearch**: Required for advanced message search
  - Set `OPENSEARCH_URL` environment variable
  - Example: `export OPENSEARCH_URL=http://localhost:9200`
- **Feature Flags**:
  - `advanced_search`: Account-level flag for message advanced search
- `search_with_gin` (optional): Account-level flag for GIN-based SQL
search

### Performance Considerations
- 90-day limit prevents expensive long-range queries on large datasets
- GIN indexes recommended for high-volume search on SQL-based resources
- OpenSearch/Elasticsearch provides faster full-text search for messages

### Breaking Changes
- None. All new parameters are optional and backward compatible.

### Frontend Integration
- Frontend PR tracking advanced search UI will consume these endpoints
- Time range pickers should convert JavaScript `Date` to Unix timestamps
(seconds)
- Date conversion: `Math.floor(date.getTime() / 1000)`

### Error Handling
- Invalid `from` parameter format is silently ignored (filter not
applied)
- Time range exceeding 90 days returns `422` with error message
- Missing `q` parameter returns `422` (existing behavior)
- Unauthorized inbox access is filtered out (no error, just excluded
from results)

---------

Co-authored-by: iamsivin <iamsivin@gmail.com>
Co-authored-by: Sivin Varghese <64252451+iamsivin@users.noreply.github.com>
Co-authored-by: Pranav <pranav@chatwoot.com>
Co-authored-by: Muhsin Keloth <muhsinkeramam@gmail.com>
2026-01-07 15:30:49 +05:30

184 lines
6.4 KiB
Ruby
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# rubocop:disable Metrics/BlockLength
namespace :search do
desc 'Create test messages for advanced search manual testing across multiple inboxes'
task setup_test_data: :environment do
puts '🔍 Setting up test data for advanced search...'
account = Account.first
unless account
puts '❌ No account found. Please create an account first.'
exit 1
end
agents = account.users.to_a
unless agents.any?
puts '❌ No agents found. Please create users first.'
exit 1
end
puts "✅ Using account: #{account.name} (ID: #{account.id})"
puts "✅ Found #{agents.count} agent(s)"
# Create missing inbox types for comprehensive testing
puts "\n📥 Checking and creating inboxes..."
# API inbox
unless account.inboxes.exists?(channel_type: 'Channel::Api')
puts ' Creating API inbox...'
account.inboxes.create!(
name: 'Search Test API',
channel: Channel::Api.create!(account: account)
)
end
# Web Widget inbox
unless account.inboxes.exists?(channel_type: 'Channel::WebWidget')
puts ' Creating WebWidget inbox...'
account.inboxes.create!(
name: 'Search Test WebWidget',
channel: Channel::WebWidget.create!(account: account, website_url: 'https://example.com')
)
end
# Email inbox
unless account.inboxes.exists?(channel_type: 'Channel::Email')
puts ' Creating Email inbox...'
account.inboxes.create!(
name: 'Search Test Email',
channel: Channel::Email.create!(
account: account,
email: 'search-test@example.com',
imap_enabled: false,
smtp_enabled: false
)
)
end
inboxes = account.inboxes.to_a
puts "✅ Using #{inboxes.count} inbox(es):"
inboxes.each { |i| puts " - #{i.name} (ID: #{i.id}, Type: #{i.channel_type})" }
# Create 10 test contacts
contacts = []
10.times do |i|
contacts << account.contacts.find_or_create_by!(
email: "test-customer-#{i}@example.com"
) do |c|
c.name = Faker::Name.name
end
end
puts "✅ Created/found #{contacts.count} test contacts"
target_messages = 50_000
messages_per_conversation = 100
total_conversations = target_messages / messages_per_conversation
puts "\n📝 Creating #{target_messages} messages across #{total_conversations} conversations..."
puts " Distribution: #{inboxes.count} inboxes × #{total_conversations / inboxes.count} conversations each"
start_time = 2.years.ago
end_time = Time.current
time_range = end_time - start_time
created_count = 0
failed_count = 0
conversations_per_inbox = total_conversations / inboxes.count
conversation_statuses = [:open, :resolved]
inboxes.each do |inbox|
conversations_per_inbox.times do
# Pick random contact and agent for this conversation
contact = contacts.sample
agent = agents.sample
# Create or find ContactInbox
contact_inbox = ContactInbox.find_or_create_by!(
contact: contact,
inbox: inbox
) do |ci|
ci.source_id = "test_#{SecureRandom.hex(8)}"
end
# Create conversation
conversation = inbox.conversations.create!(
account: account,
contact: contact,
inbox: inbox,
contact_inbox: contact_inbox,
status: conversation_statuses.sample
)
# Create messages for this conversation (50 incoming, 50 outgoing)
50.times do
random_time = start_time + (rand * time_range)
# Incoming message from contact
begin
Message.create!(
content: Faker::Movie.quote,
account: account,
inbox: inbox,
conversation: conversation,
message_type: :incoming,
sender: contact,
created_at: random_time,
updated_at: random_time
)
created_count += 1
rescue StandardError => e
failed_count += 1
puts "❌ Failed to create message: #{e.message}" if failed_count <= 5
end
# Outgoing message from agent
begin
Message.create!(
content: Faker::Movie.quote,
account: account,
inbox: inbox,
conversation: conversation,
message_type: :outgoing,
sender: agent,
created_at: random_time + rand(60..600),
updated_at: random_time + rand(60..600)
)
created_count += 1
rescue StandardError => e
failed_count += 1
puts "❌ Failed to create message: #{e.message}" if failed_count <= 5
end
print "\r🔄 Progress: #{created_count}/#{target_messages} messages created..." if (created_count % 500).zero?
end
end
end
puts "\n\n✅ Successfully created #{created_count} messages!"
puts "❌ Failed: #{failed_count}" if failed_count.positive?
puts "\n📊 Summary:"
puts " - Total messages: #{Message.where(account: account).count}"
puts " - Total conversations: #{Conversation.where(account: account).count}"
min_date = Message.where(account: account).minimum(:created_at)&.strftime('%Y-%m-%d')
max_date = Message.where(account: account).maximum(:created_at)&.strftime('%Y-%m-%d')
puts " - Date range: #{min_date} to #{max_date}"
puts "\nBreakdown by inbox:"
inboxes.each do |inbox|
msg_count = Message.where(inbox: inbox).count
conv_count = Conversation.where(inbox: inbox).count
puts " - #{inbox.name} (#{inbox.channel_type}): #{msg_count} messages, #{conv_count} conversations"
end
puts "\nBreakdown by sender type:"
puts " - Incoming (from contacts): #{Message.where(account: account, message_type: :incoming).count}"
puts " - Outgoing (from agents): #{Message.where(account: account, message_type: :outgoing).count}"
puts "\n🔧 Next steps:"
puts ' 1. Ensure OpenSearch is running: mise elasticsearch-start'
puts ' 2. Reindex messages: rails runner "Message.search_index.import Message.all"'
puts " 3. Enable feature: rails runner \"Account.find(#{account.id}).enable_features('advanced_search')\""
puts "\n💡 Then test the search with filters via API or Rails console!"
end
end
# rubocop:enable Metrics/BlockLength