step

openai_embeddings

Generate vector embeddings for text using OpenAI's embedding models.

Overview

Generate vector embeddings for text using OpenAI's embedding models. This step converts text into numerical vector representations (embeddings) that capture semantic meaning. These vectors enable similarity search, clustering, recommendations, and RAG (Retrieval Augmented Generation) systems. OpenAI's text-embedding-3-small and text-embedding-3-large models provide state-of-the-art embeddings with configurable dimensions for optimizing quality vs storage. You can embed single texts or batch process arrays of text chunks. The resulting vectors can be stored in vector databases (Pinecone, Weaviate, etc.) for semantic search applications.

Quick Start

steps:
- type: openai_embeddings
  api_key: sk-live-123
  model: text-embedding-3-small

Configuration

Parameter Type Required Description
api_key string Yes OpenAI-compatible API key sent as a Bearer token.
model string Yes Embedding model name (for example 'text-embedding-3-small').
input_from string No Dot path selecting the text or list of texts to embed. When omitted, the entire event is serialized to JSON.
input_key string No DEPRECATED: Use 'input_from' instead. Dot path selecting text to embed.
output_to string No Event key that receives the list of embedding vectors for each input item.
Default: "embeddings"
output_key string No DEPRECATED: Use 'output_to' instead. Event key for embedding vectors.
encoding_format string No Optional encoding format forwarded to the API (for example 'float' or 'base64').
dimensions string No Requested embedding dimensionality when supported by the provider.
base_url string No Base API URL for the embeddings endpoint. Override for proxy deployments.
Default: "https://api.openai.com/v1"
raw_on_error boolean No When True, store the raw response body under '<output_key>_raw' if JSON parsing fails after a successful request.
Default: true
swallow_on_error boolean No If True, skip injecting error details and return the original event on failures.
Default: false
extra_headers string No Additional HTTP headers merged into each request alongside the defaults (Authorization, Content-Type, Accept, User-Agent).

Examples

Basic semantic search embeddings

Generate embeddings for text using the efficient small model

type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-small
input_from: document.text
output_to: document.embedding

High-quality embeddings with custom dimensions

Use the large model with reduced dimensions for storage efficiency

type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-large
input_from: article.content
output_to: article.vector
dimensions: 1024
encoding_format: float

Batch embed document chunks

Generate embeddings for multiple text chunks at once

type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-small
input_from: document.chunks
output_to: document.chunk_embeddings

Embeddings for RAG pipeline

Prepare text chunks for vector database storage

type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-large
input_from: chunks
output_to: vectors
dimensions: 1536
include_usage: true

Advanced Options

These options are available on all steps for error handling and retry logic:

Parameter Type Default Description
retries integer 0 Number of retry attempts (0-10)
backoff_seconds number 0 Backoff (seconds) applied between retry attempts
retry_propagate boolean false If True, raise last exception after exhausting retries; otherwise swallow.