fix(messages): reduce audio transcription 400 retry noise (#13487)

## Summary This PR reduces duplicate failure noise for audio transcription jobs that fail with permanent HTTP 400 responses, and fixes a file-format edge case causing intermittent 400s. Sentry issue: [CHATWOOT-99E / 6660541334](https://chatwoot-p3.sentry.io/issues/6660541334/) ## Confirmed root cause For some attachments, the stored filename had no extension (example: `speech`, content type `audio/mpeg`). When the temporary transcription upload file was created without an extension, OpenAI returned: `Unrecognized file format` (HTTP 400). ## Scope of changes 1. `Messages::AudioTranscriptionJob` - Keeps `discard_on Faraday::BadRequestError` to avoid retry storms on permanent request errors. - Adds explicit Rails warning logs for discarded jobs with attachment/job/status context. 2. `Messages::AudioTranscriptionService` - Keeps guaranteed temp file cleanup via `ensure`. - Ensures temp upload files include an extension when the original filename has none, derived from blob `content_type`. - This addresses intermittent failures like extensionless `audio/mpeg` files. ## Reproduction Enable audio transcription for an account and process an audio attachment whose stored filename has no extension (for example `speech`) but valid audio content type (`audio/mpeg`). Before this fix, OpenAI transcription could return HTTP 400 `Unrecognized file format` for that attachment while similar attachments with extensions succeeded. ## Testing Ran: `bundle exec rubocop enterprise/app/jobs/messages/audio_transcription_job.rb enterprise/app/services/messages/audio_transcription_service.rb` Result: both modified files pass lint with no offenses.
2026-02-16 23:55:13 -08:00
parent 9cd7c4ef89
commit 61eaa098ae
3 changed files with 55 additions and 8 deletions
--- a/enterprise/app/services/messages/audio_transcription_service.rb
+++ b/enterprise/app/services/messages/audio_transcription_service.rb
@@ -31,12 +31,20 @@ class Messages::AudioTranscriptionService< Llm::LegacyBaseOpenAiService
  end

  def fetch_audio_file
+    blob = attachment.file.blob
    temp_dir = Rails.root.join('tmp/uploads/audio-transcriptions')
    FileUtils.mkdir_p(temp_dir)
-    temp_file_path = File.join(temp_dir, "#{attachment.file.blob.key}-#{attachment.file.filename}")
+    temp_file_name = "#{blob.key}-#{blob.filename}"
+
+    if blob.filename.extension_without_delimiter.blank?
+      extension = extension_from_content_type(blob.content_type)
+      temp_file_name = "#{temp_file_name}.#{extension}" if extension.present?
+    end
+
+    temp_file_path = File.join(temp_dir, temp_file_name)

    File.open(temp_file_path, 'wb') do |file|
-      attachment.file.blob.open do |blob_file|
+      blob.open do |blob_file|
        IO.copy_stream(blob_file, file)
      end
    end
@@ -49,13 +57,12 @@ class Messages::AudioTranscriptionService< Llm::LegacyBaseOpenAiService
    return transcribed_text if transcribed_text.present?

    temp_file_path = fetch_audio_file
-
    transcribed_text = nil

    File.open(temp_file_path, 'rb') do |file|
      response = @client.audio.transcribe(
        parameters: {
-          model: 'whisper-1',
+          model: WHISPER_MODEL,
          file: file,
          temperature: 0.4
        }
@@ -63,10 +70,10 @@ class Messages::AudioTranscriptionService< Llm::LegacyBaseOpenAiService
      transcribed_text = response['text']
    end

-    FileUtils.rm_f(temp_file_path)
-
    update_transcription(transcribed_text)
    transcribed_text
+  ensure
+    FileUtils.rm_f(temp_file_path) if temp_file_path.present?
  end

  def instrumentation_params(file_path)
@@ -90,4 +97,15 @@ class Messages::AudioTranscriptionService< Llm::LegacyBaseOpenAiService

    message.reindex
  end
+
+  def extension_from_content_type(content_type)
+    subtype = content_type.to_s.downcase.split(';').first.to_s.split('/').last.to_s
+    return if subtype.blank?
+
+    {
+      'x-m4a' => 'm4a',
+      'x-wav' => 'wav',
+      'x-mp3' => 'mp3'
+    }.fetch(subtype, subtype)
+  end
 end