feat: Add company backfill migration for existing contacts (Part 1) (#12657)

## Description

Implements company backfill migration infrastructure for existing
contacts. This is **Part 1 of 2** for the company model production
rollout as described in
[CW-5726](https://linear.app/chatwoot/issue/CW-5726/company-model-setting-it-up-on-production).

Creates jobs and services to associate existing contacts with companies
based on their email domains, filtering out free email providers (gmail,
yahoo, etc.) and disposable addresses.
 

**What's included:**
- Business email detector service with ValidEmail2 (uses
`disposable_domain?` to avoid DNS lookups)
- Per-account batch job to process contacts for one account
- Orchestrator job to iterate all accounts
- Rake task: `bundle exec rake companies:backfill`

~~*NOTE*: I'm using a hard-coded approach to determine if something is a
"business" email by filtering out emails that are usually personal. I've
also added domains that are common to some of our customers' regions.
This should be simpler. I looked into `Valid_Email2` and I couldn't find
anything to dictate whether an email is a personal email or a business
one. I don't think the approach used in the frontend is valid here.~~
UPDATE: Using `email_provider_info` gem instead.


**Pending - Part 2 (separate PR):** Real-time company creation for new
contacts

## Type of change

- [x] New feature (non-breaking change which adds functionality)

## How Has This Been Tested?

```bash
# Run all new tests
bundle exec rspec spec/enterprise/services/companies/business_email_detector_service_spec.rb \\
                   spec/enterprise/jobs/migration/company_account_batch_job_spec.rb \\
                   spec/enterprise/jobs/migration/company_backfill_job_spec.rb

# Run RuboCop
bundle exec rubocop enterprise/app/services/companies/business_email_detector_service.rb \\
                     enterprise/app/jobs/migration/company_account_batch_job.rb \\
                     enterprise/app/jobs/migration/company_backfill_job.rb \\
                     lib/tasks/companies.rake
```

**Performance optimization:**
- Uses `disposable_domain?` instead of `disposable?` to avoid DNS MX
lookups (discovered via tcpdump analysis - `disposable?` was making
network calls for every email, causing 100x slowdown)

## Checklist:

- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have commented on my code, particularly in hard-to-understand
areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream
modules

---------

Co-authored-by: Sojan Jose <sojan@pepalo.com>
This commit is contained in:
Vinay Keerthi
2025-11-03 20:03:47 +05:30
committed by GitHub
parent e771d99552
commit ef54f07d5b
12 changed files with 376 additions and 4 deletions

View File

@@ -0,0 +1,120 @@
require 'rails_helper'
RSpec.describe Migration::CompanyAccountBatchJob, type: :job do
let(:account) { create(:account) }
describe '#perform' do
before do
# Stub EmailProvideInfo to control behavior in tests
allow(EmailProviderInfo).to receive(:call) do |email|
domain = email.split('@').last&.downcase
case domain
when 'gmail.com', 'yahoo.com', 'hotmail.com', 'uol.com.br'
'free_provider' # generic free provider name
end
end
end
context 'when contact has business email' do
let!(:contact) { create(:contact, account: account, email: 'user@acme.com') }
it 'creates a company and associates the contact' do
expect do
described_class.perform_now(account)
end.to change(Company, :count).by(1)
contact.reload
expect(contact.company).to be_present
expect(contact.company.domain).to eq('acme.com')
expect(contact.company.name).to eq('Acme')
end
end
context 'when contact has free email' do
let!(:contact) { create(:contact, account: account, email: 'user@gmail.com') }
it 'does not create a company' do
expect do
described_class.perform_now(account)
end.not_to change(Company, :count)
contact.reload
expect(contact.company_id).to be_nil
end
end
context 'when contact has company_name in additional_attributes' do
let!(:contact) do
create(:contact, account: account, email: 'user@acme.com', additional_attributes: { 'company_name' => 'Acme Corporation' })
end
it 'uses the saved company name' do
described_class.perform_now(account)
contact.reload
expect(contact.company.name).to eq('Acme Corporation')
end
end
context 'when contact already has a company' do
let!(:existing_company) { create(:company, account: account, domain: 'existing.com') }
let!(:contact) do
create(:contact, account: account, email: 'user@acme.com', company: existing_company)
end
it 'does not change the existing company' do
described_class.perform_now(account)
contact.reload
expect(contact.company_id).to eq(existing_company.id)
end
end
context 'when multiple contacts have the same domain' do
let!(:contact1) { create(:contact, account: account, email: 'user1@acme.com') }
let!(:contact2) { create(:contact, account: account, email: 'user2@acme.com') }
it 'creates only one company for the domain' do
expect do
described_class.perform_now(account)
end.to change(Company, :count).by(1)
contact1.reload
contact2.reload
expect(contact1.company_id).to eq(contact2.company_id)
expect(contact1.company.domain).to eq('acme.com')
end
end
context 'when contact has no email' do
let!(:contact) { create(:contact, account: account, email: nil) }
it 'skips the contact' do
expect do
described_class.perform_now(account)
end.not_to change(Company, :count)
contact.reload
expect(contact.company_id).to be_nil
end
end
context 'when processing large batch' do
before do
contacts_data = Array.new(2000) do |i|
{
account_id: account.id,
email: "user#{i}@company#{i % 100}.com",
name: "User #{i}",
created_at: Time.current,
updated_at: Time.current
}
end
# rubocop:disable Rails/SkipsModelValidations
Contact.insert_all(contacts_data)
# rubocop:enable Rails/SkipsModelValidations
end
it 'processes all contacts in batches' do
expect do
described_class.perform_now(account)
end.to change(Company, :count).by(100)
expect(account.contacts.where.not(company_id: nil).count).to eq(2000)
end
end
end
end

View File

@@ -0,0 +1,31 @@
require 'rails_helper'
RSpec.describe Migration::CompanyBackfillJob, type: :job do
describe '#perform' do
it 'enqueues the job' do
expect { described_class.perform_later }
.to have_enqueued_job(described_class)
.on_queue('low')
end
context 'when accounts exist' do
let!(:account1) { create(:account) }
let!(:account2) { create(:account) }
it 'enqueues CompanyAccountBatchJob for each account' do
expect do
described_class.perform_now
end.to have_enqueued_job(Migration::CompanyAccountBatchJob)
.with(account1)
.and have_enqueued_job(Migration::CompanyAccountBatchJob)
.with(account2)
end
end
context 'when no accounts exist' do
it 'completes without error' do
expect { described_class.perform_now }.not_to raise_error
end
end
end
end

View File

@@ -0,0 +1,99 @@
require 'rails_helper'
RSpec.describe Companies::BusinessEmailDetectorService, type: :service do
let(:service) { described_class.new(email) }
describe '#perform' do
context 'when email is from a business domain' do
let(:email) { 'user@acme.com' }
let(:valid_email_address) { instance_double(ValidEmail2::Address, valid?: true, disposable_domain?: false) }
before do
allow(ValidEmail2::Address).to receive(:new).with(email).and_return(valid_email_address)
allow(EmailProviderInfo).to receive(:call).with(email).and_return(nil)
end
it 'returns true' do
expect(service.perform).to be(true)
end
end
context 'when email is from gmail' do
let(:email) { 'user@gmail.com' }
let(:valid_email_address) { instance_double(ValidEmail2::Address, valid?: true, disposable_domain?: false) }
before do
allow(ValidEmail2::Address).to receive(:new).with(email).and_return(valid_email_address)
allow(EmailProviderInfo).to receive(:call).with(email).and_return('gmail')
end
it 'returns false' do
expect(service.perform).to be(false)
end
end
context 'when email is from Brazilian free provider' do
let(:email) { 'user@uol.com.br' }
let(:valid_email_address) { instance_double(ValidEmail2::Address, valid?: true, disposable_domain?: false) }
before do
allow(ValidEmail2::Address).to receive(:new).with(email).and_return(valid_email_address)
allow(EmailProviderInfo).to receive(:call).with(email).and_return('uol')
end
it 'returns false' do
expect(service.perform).to be(false)
end
end
context 'when email is disposable' do
let(:email) { 'user@mailinator.com' }
let(:disposable_email_address) { instance_double(ValidEmail2::Address, valid?: true, disposable_domain?: true) }
it 'returns false' do
allow(ValidEmail2::Address).to receive(:new).with(email).and_return(disposable_email_address)
expect(service.perform).to be(false)
end
end
context 'when email is invalid format' do
let(:email) { 'invalid-email' }
let(:invalid_email_address) { instance_double(ValidEmail2::Address, valid?: false) }
it 'returns false' do
allow(ValidEmail2::Address).to receive(:new).with(email).and_return(invalid_email_address)
expect(service.perform).to be(false)
end
end
context 'when email is nil' do
let(:email) { nil }
it 'remains false' do
expect(service.perform).to be(false)
end
end
context 'when email is empty string' do
let(:email) { '' }
it 'returns false' do
expect(service.perform).to be(false)
end
end
context 'when email domain is uppercase' do
let(:email) { 'user@GMAIL.COM' }
let(:valid_email_address) { instance_double(ValidEmail2::Address, valid?: true, disposable_domain?: false) }
before do
allow(ValidEmail2::Address).to receive(:new).with(email).and_return(valid_email_address)
allow(EmailProviderInfo).to receive(:call).with(email).and_return('gmail')
end
it 'returns false (case insensitive)' do
expect(service.perform).to be(false)
end
end
end
end