feat: Add company backfill migration for existing contacts (Part 1) (#12657)

## Description

Implements company backfill migration infrastructure for existing
contacts. This is **Part 1 of 2** for the company model production
rollout as described in
[CW-5726](https://linear.app/chatwoot/issue/CW-5726/company-model-setting-it-up-on-production).

Creates jobs and services to associate existing contacts with companies
based on their email domains, filtering out free email providers (gmail,
yahoo, etc.) and disposable addresses.
 

**What's included:**
- Business email detector service with ValidEmail2 (uses
`disposable_domain?` to avoid DNS lookups)
- Per-account batch job to process contacts for one account
- Orchestrator job to iterate all accounts
- Rake task: `bundle exec rake companies:backfill`

~~*NOTE*: I'm using a hard-coded approach to determine if something is a
"business" email by filtering out emails that are usually personal. I've
also added domains that are common to some of our customers' regions.
This should be simpler. I looked into `Valid_Email2` and I couldn't find
anything to dictate whether an email is a personal email or a business
one. I don't think the approach used in the frontend is valid here.~~
UPDATE: Using `email_provider_info` gem instead.


**Pending - Part 2 (separate PR):** Real-time company creation for new
contacts

## Type of change

- [x] New feature (non-breaking change which adds functionality)

## How Has This Been Tested?

```bash
# Run all new tests
bundle exec rspec spec/enterprise/services/companies/business_email_detector_service_spec.rb \\
                   spec/enterprise/jobs/migration/company_account_batch_job_spec.rb \\
                   spec/enterprise/jobs/migration/company_backfill_job_spec.rb

# Run RuboCop
bundle exec rubocop enterprise/app/services/companies/business_email_detector_service.rb \\
                     enterprise/app/jobs/migration/company_account_batch_job.rb \\
                     enterprise/app/jobs/migration/company_backfill_job.rb \\
                     lib/tasks/companies.rake
```

**Performance optimization:**
- Uses `disposable_domain?` instead of `disposable?` to avoid DNS MX
lookups (discovered via tcpdump analysis - `disposable?` was making
network calls for every email, causing 100x slowdown)

## Checklist:

- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have commented on my code, particularly in hard-to-understand
areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream
modules

---------

Co-authored-by: Sojan Jose <sojan@pepalo.com>
This commit is contained in:
Vinay Keerthi
2025-11-03 20:03:47 +05:30
committed by GitHub
parent e771d99552
commit ef54f07d5b
12 changed files with 376 additions and 4 deletions

View File

@@ -0,0 +1,54 @@
class Migration::CompanyAccountBatchJob < ApplicationJob
queue_as :low
def perform(account)
account.contacts
.where.not(email: nil)
.find_in_batches(batch_size: 1000) do |contact_batch|
process_contact_batch(contact_batch, account)
end
end
private
def process_contact_batch(contacts, account)
contacts.each do |contact|
next unless should_process?(contact)
company = find_or_create_company(contact, account)
# rubocop:disable Rails/SkipsModelValidations
contact.update_column(:company_id, company.id) if company
# rubocop:enable Rails/SkipsModelValidations
end
end
def should_process?(contact)
return false if contact.company_id.present?
return false if contact.email.blank?
Companies::BusinessEmailDetectorService.new(contact.email).perform
end
def find_or_create_company(contact, account)
domain = extract_domain(contact.email)
company_name = derive_company_name(contact, domain)
Company.find_or_create_by!(account: account, domain: domain) do |company|
company.name = company_name
end
rescue ActiveRecord::RecordNotUnique
# Race condition: Another job created it between our check and create
# just find the one that was created
Company.find_by(account: account, domain: domain)
end
def extract_domain(email)
email.split('@').last&.downcase
end
def derive_company_name(contact, domain)
contact.additional_attributes&.dig('company_name').presence ||
domain.split('.').first.tr('-_', ' ').titleize
end
end

View File

@@ -0,0 +1,17 @@
class Migration::CompanyBackfillJob < ApplicationJob
queue_as :low
def perform
Rails.logger.info 'Starting company backfill migration...'
account_count = 0
Account.find_in_batches(batch_size: 100) do |accounts|
accounts.each do |account|
Rails.logger.info "Enqueuing company backfill for account #{account.id}"
Migration::CompanyAccountBatchJob.perform_later(account)
account_count += 1
end
end
Rails.logger.info "Company backfill migration complete. Enqueued jobs for #{account_count} accounts."
end
end

View File

@@ -12,9 +12,9 @@
#
# Indexes
#
# index_companies_on_account_id (account_id)
# index_companies_on_domain_and_account_id (domain,account_id)
# index_companies_on_name_and_account_id (name,account_id)
# index_companies_on_account_and_domain (account_id,domain) UNIQUE WHERE (domain IS NOT NULL)
# index_companies_on_account_id (account_id)
# index_companies_on_name_and_account_id (name,account_id)
#
class Company < ApplicationRecord
include Avatarable
@@ -24,6 +24,7 @@ class Company < ApplicationRecord
with: /\A[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)+\z/,
message: I18n.t('errors.companies.domain.invalid')
}
validates :domain, uniqueness: { scope: :account_id }, if: -> { domain.present? }
validates :description, length: { maximum: Limits::COMPANY_DESCRIPTION_LENGTH_LIMIT }
belongs_to :account

View File

@@ -0,0 +1,19 @@
class Companies::BusinessEmailDetectorService
attr_reader :email
def initialize(email)
@email = email
end
def perform
return false if email.blank?
address = ValidEmail2::Address.new(email)
return false unless address.valid?
return false if address.disposable_domain?
provider = EmailProviderInfo.call(email)
provider.nil?
end
end