Skip to content

Add context option to .import for passing custom data #996

Open
dimitrikochnev wants to merge 1 commit intotoptal:masterfrom
lockvoid:context
Open

Add context option to .import for passing custom data #996
dimitrikochnev wants to merge 1 commit intotoptal:masterfrom
lockvoid:context

Conversation

@dimitrikochnev
Copy link

Motivation

When importing objects, the data needed for indexing (e.g., embeddings, computed attributes) is often already available in memory at the call site. However, currently, crutch blocks are forced to re-fetch (even with direct_import) this data from the database, resulting in redundant queries and wasted resources.

There is currently no mechanism to pass this pre-computed data from the import call site down into crutch blocks or field value procs.

This design aligns with established patterns in the Ruby ecosystem, such as graphql-ruby and ActiveModelSerializers. In those libraries, a context object is similarly passed through the resolution or serialization stack to share request-level state (like current user, auth tokens, or pre-loaded data) without relying on global state.

Solution

Add an optional context: keyword argument to import/import! that flows through the entire indexing pipeline. Context is an arbitrary hash, defaulting to {}.

# Pass pre-computed data to avoid redundant DB queries
MyIndex.import!(objects, context: { embeddings: precomputed_embeddings })

Context in crutch blocks (2nd argument)

crutch :embeddings do |collection, context|
  # Use pre-computed data if available, otherwise fetch from DB
  context[:embeddings] || load_embeddings(collection)
end

Context in field value procs (3rd argument)

field :embedding, value: ->(object, crutches, context) {
  context[:override] || crutches.embeddings[object.id]
}

Both are fully backward-compatible — existing 1-arg crutch blocks and 1-2 arg field procs continue to work unchanged via arity-based dispatch.

@dimitrikochnev dimitrikochnev requested a review from a team as a code owner January 29, 2026 11:47
@bbatsov
Copy link
Member

bbatsov commented Feb 25, 2026

Master has been updated with CI fixes and compatibility changes (#998) — we now target Ruby 3.2+ and Rails 7.2+. Could you rebase this PR on top of master so CI can run properly? Thanks!

@dimitrikochnev
Copy link
Author

Let me find there is it :-)

@dimitrikochnev
Copy link
Author

@bbatsov yep, done

@bbatsov
Copy link
Member

bbatsov commented Feb 25, 2026

Interesting feature. The design is clean — flowing context: through the pipeline with arity-based backward compatibility is the right approach.

A couple of questions:

  • Could you rebase onto current master? (I've made quite a few changes today, sorry about that!)
  • The PR is well-tested but could use a changelog entry.

Will review the code in detail after rebase.

@dimitrikochnev
Copy link
Author

dimitrikochnev commented Feb 26, 2026

Done.

I will give you a practical example storing video embeddings:

class Video::EncodeSemanticProcessor < ProcessorMan::Processor
  include MediaProcessing

  option :model, :string, default: "auto"

  def process
    upstream_media, upstream_video = require_upstreams(0 => :media, 1 => :video)

    result = with_progress(0.1, 0.9) do |progress|
      Ferment::Engine.stream("/v1/video/encode-semantic", video_in: upstream_video.cook.blob.url, **options.to_params) do |event|
        progress.call(event[:progress]) if event[:type] == :progress
      end
    end

    MultiTenancy.with(network.user) do
      visual_slices, base64_embeddings = build_visual_slices(result)

      media.update!(visual_slices: visual_slices)

      Media::Embedding.bulk_upsert(media, base64_embeddings)  # <--- Store raw embeddings in the db

      Chewy.strategy(:atomic) do
        context = { embeddings: Media::Embedding.to_floats(base64_embeddings) }

        MediaIndex.delete_for(media, type: 'Media::Data::VisualSlice')
        MediaIndex.import!(Array.wrap(media.visual_slices), direct_import: true, context: context) # <--- Don't perform a huge round trip to the db, classic reindex still works since the embeddings will be fetched from the origin db
      end
    end

    succeed(extract_data(result, strip_nested: true))
  end

  private

    def build_visual_slices(result)
      embeddings = {}

      slices = result.fetch(:slices).map do |slice_data|
        slice = Media::Data::VisualSlice.new(
          id: media_cid(:visual_slice, slice_data.fetch(:position)),
          blob_id: media.id,
          blob_type: media.class.name,
          user_id: media.user_id,
          position: slice_data.fetch(:position),
          start_time: slice_data.fetch(:start_time),
          duration: slice_data.fetch(:duration),
          first_frame: slice_data.fetch(:first_frame),
          frame_count: slice_data.fetch(:frame_count),
          semantic_embedding_model: result.fetch(:model), 
          semantic_embedding_dim: result.fetch(:embedding_dim),
        )

        embeddings[slice.embedding_key] = slice_data.fetch(:embedding) # <--- HUGE

        slice
      end

      [slices, embeddings]
    end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants