Describe the bug
In v4.8.0, when an audio message is received, the system enqueues
Messages::AudioTranscriptionJob even if OpenAI and Captain are disabled.
This causes a Faraday::UnauthorizedError (401) which crashes the Sidekiq
job and breaks the pipeline for that message.
To Reproduce
Disable OpenAI/Captain integrations.
Send an audio message to an inbox.
Check Sidekiq logs and observe the 401 crash in
AudioTranscriptionService.
What this PR does
Adds a rescue Faraday::UnauthorizedError block inside
AudioTranscriptionService#perform. Instead of crashing the worker, it
logs a warning and gracefully exits, allowing the job to complete
successfully.
Note: This fixes the backend crash. However, there is still a frontend
reactivity issue where the audio player UI requires an F5 to load the
media, which has been reported in Issue #11013.
---------
Co-authored-by: Eloi Junior Seganfredo <eloi@seganfredo.local>
Co-authored-by: Aakash Bakhle <48802744+aakashb95@users.noreply.github.com>
Co-authored-by: Muhsin Keloth <muhsinkeramam@gmail.com>
## Summary
This PR reduces duplicate failure noise for audio transcription jobs
that fail with permanent HTTP 400 responses, and fixes a file-format
edge case causing intermittent 400s.
Sentry issue: [CHATWOOT-99E /
6660541334](https://chatwoot-p3.sentry.io/issues/6660541334/)
## Confirmed root cause
For some attachments, the stored filename had no extension (example:
`speech`, content type `audio/mpeg`).
When the temporary transcription upload file was created without an
extension, OpenAI returned:
`Unrecognized file format` (HTTP 400).
## Scope of changes
1. `Messages::AudioTranscriptionJob`
- Keeps `discard_on Faraday::BadRequestError` to avoid retry storms on
permanent request errors.
- Adds explicit Rails warning logs for discarded jobs with
attachment/job/status context.
2. `Messages::AudioTranscriptionService`
- Keeps guaranteed temp file cleanup via `ensure`.
- Ensures temp upload files include an extension when the original
filename has none, derived from blob `content_type`.
- This addresses intermittent failures like extensionless `audio/mpeg`
files.
## Reproduction
Enable audio transcription for an account and process an audio
attachment whose stored filename has no extension (for example `speech`)
but valid audio content type (`audio/mpeg`).
Before this fix, OpenAI transcription could return HTTP 400
`Unrecognized file format` for that attachment while similar attachments
with extensions succeeded.
## Testing
Ran:
`bundle exec rubocop
enterprise/app/jobs/messages/audio_transcription_job.rb
enterprise/app/services/messages/audio_transcription_service.rb`
Result: both modified files pass lint with no offenses.
We’ve been watching Sidekiq workers climb from ~600 MB at boot to
1.4–1.5 GB after an hour whenever attachment-heavy jobs run. This PR is
an experiment to curb that growth by streaming attachments instead of
loading the whole blob into Ruby: reply-mailer inline attachments,
Telegram uploads, and audio transcriptions now read/write in chunks. If
this keeps RSS stable in production we’ll keep it; otherwise we’ll roll
it back and keep digging
We now support searching within the actual message content, email
subject lines, and audio transcriptions. This enables a faster, more
accurate search experience going forward. Unlike the standard message
search, which is limited to the last 3 months, this search has no time
restrictions.
The search engine also accounts for small variations in queries. Minor
spelling mistakes, such as searching for slck instead of Slack, will
still return the correct results. It also ignores differences in accents
and diacritics, so searching for Deja vu will match content containing
Déjà vu.
We can also refine searches in the future by criteria such as:
- Searching within a specific inbox
- Filtering by sender or recipient
- Limiting to messages sent by an agent
Fixes https://github.com/chatwoot/chatwoot/issues/11656
Fixes https://github.com/chatwoot/chatwoot/issues/10669
Fixes https://github.com/chatwoot/chatwoot/issues/5910
---
Rake tasks to reindex all the messages.
```sh
bundle exec rake search:all
```
Rake task to reindex messages from one account only
```sh
bundle exec rake search:account ACCOUNT_ID=1
```