Semantic search in Rails

Semantic search allows you to find data based on meaning. This is different from a traditional lexical search where you are looking to match exact-ish keywords.

Searching by meaning should improve the result’s accuracy and the experience. A user can focus on their intent and not on “hacking” the search tool to get decent results.

Note: it’s easy to implement a full-text lexical search in your Rails app.

To add semantic search into Rails app here’s what we’ll have to do.

Assign meaning to our data.
- Generate embeddings with OpenAI API.
Store meaning in Postgres.
- Use pg_vector extension.
Assign meaning to a search query.
- Generate embeddings for the query with OpenAI.
Find the most relevant data.
- Using nearest_neighbor gem.

In this example, we’ll build a semantic search over a database of blog articles with text content.

Assigning meaning

To find an article’s meaning we’ll use artificial intelligence and one of the large language models. In this case, we are using OpenAI’s API. It is simple, cheap and easy to deploy on an existing infrastructure.

Embeddings

We’ll use their embeddings API. You can pass some text and you’ll get its embeddings back.

Langchain::LLM::OpenAI.new(api_key: ...).embed(text: content).embedding

Note: I’m using a built-in OpenAI client from langchain here. The primary use case for langchain will be chunking discussed a little further below.

Embeddings are a mathematical representation of text relatedness. It’s a vector of decimal numbers where the distance between them measures how related the two strings are.

In a Rails console, they look like an array with a bunch of decimal numbers.

[0.005204988,
 0.029841444,
 0.016162278,
 -0.027387716,
...]

Now, we can take two of these and calculate the distance between them. It will be a single decimal number between 0 and 1. The closer it is to 1 the more related they are to each other.

Chunking

If our articles were short and could fit into the model’s context we could move on to the next section. But usually, they are larger and we have to split the text into smaller parts – chunks.

To chunk an article up we’ll use langchain gem. It is a Swiss army knife for text manipulation in LLM applications.

Langchain::Chunker::RecursiveText.new(
  text_content,
  chunk_size: 1536,
  chunk_overlap: 200,
  separators: ['\n\n']
).chunks

The model that we will be using has a context size of 1536 so that’s the chunk size for us.

Once we have chunks we can generate embeddings for each one of them through OpenAI.

chunks.map { |chunk| open_ai.embed(text: content).embedding }

Storing meaning in Postgres

Now, we need to store the chunks and link them back to our articles. We’ll create a standard ActiveRecord model. But first, we need to figure out how to store embeddings in Postgres.

For that, we’ll use pg_vector extension. We need to install it first as it doesn’t come with the default distribution. After that, we can enable the extension in a standard migration.

def change
  enable_extension "vector"
end

And with that, we can create our model to store the chunks with embeddings using a new vector datatype.

create_table :chunks do |t|
  t.references :article, null: false, foreign_key: true
  t.text :content
  t.vector :embedding, limit: 1536

  t.timestamps
end

Finding relevant data

Now that we have all the articles chunked up with embeddings we need to be able to query them.

Here, we’ll use nearest_neighbor gem.

class Chunk < ApplicationRecord
  has_neighbors :embedding
  ...
end

After adding has_neighbors into our model we’ll be able to use the .nearest_neighbors class method to query embeddings we saved earlier.

We can hack everything together and turn the user’s search query into embeddings. Pass it into the .nearest_neighbors method and map the results back to articles.

query_embedding = open_ai.embed(text: query).embedding
ids = Chunk.nearest_neighbors(:embedding, query_embedding, distance: "cosine").first(20).map(&:article_id).uniq
Article.where(id: ids)

And that’s it. We can search our article database by meaning instead of relying on keyword matching. And as a bonus, we’ve integrated AI with LLM into our Rails app. The board is excited and our company’s valuation increased 2-3x.

How much does it cost?

Not much. But it depends on the volume and the model used.

For generating embeddings for ~600 blog articles I paid under $0.2. Since I’m the only one searching the database the cost of search queries is negligible.

Setup on CircleCI

Surprisingly, the most time-consuming task was getting pg_vector installed on CircleCI.

The extension needs a few additional steps to install. That’s straightforward on a VPS or your dev machine but not on CircleCI.

CircleCI has a primary container where your tasks are running. And there are supporting containers, like databases, for everything else. However, you can’t run an arbitrary script in your setup phase on a supporting container.

The best way around that is to create a docker image with pg_vector installed. And then, use it for the supporting container.

In the end, I forked cimg/postgres convenience image from CircleCI repository. Then, I added the installation steps after the main Postgres setup.

RUN cd /tmp && \
    git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git && \
    cd pgvector && \
    make -j $(nproc) && \
    PATH=$PATH make install

Lastly, I published the image to a public Docker repository. And used it in my .circleci/config.yml instead of the original convenience image.