step

openai_embeddings

Generate vector embeddings for text using OpenAI's embedding models.

Overview

This step converts text into numerical vector representations (embeddings) that capture semantic
meaning. These vectors enable similarity search, clustering, recommendations, and RAG (Retrieval
Augmented Generation) systems. OpenAI's text-embedding-3-small and text-embedding-3-large models
provide state-of-the-art embeddings with configurable dimensions for optimizing quality vs storage.
You can embed single texts or batch process arrays of text chunks. The resulting vectors can be
stored in vector databases (Pinecone, Weaviate, etc.) for semantic search applications.

Setup:
1. Create an OpenAI account at https://platform.openai.com/
2. Generate an API key from the API Keys section (https://platform.openai.com/api-keys)
3. Store your API key securely (e.g., as an environment variable: OPENAI_API_KEY)

API Key: Required. Get your API key from https://platform.openai.com/api-keys

Examples

Basic semantic search embeddings

Generate embeddings for text using the efficient small model

type: openai_embeddings
api_key: ${env:OPENAI_API_KEY}
model: text-embedding-3-small
input_from: document.text
output_to: document.embedding

High-quality embeddings with custom dimensions

Use the large model with reduced dimensions for storage efficiency

type: openai_embeddings
api_key: ${env:OPENAI_API_KEY}
model: text-embedding-3-large
input_from: article.content
output_to: article.vector
dimensions: 1024
encoding_format: float

Batch embed document chunks

Generate embeddings for multiple text chunks at once

type: openai_embeddings
api_key: ${env:OPENAI_API_KEY}
model: text-embedding-3-small
input_from: document.chunks
output_to: document.chunk_embeddings

Embeddings for RAG pipeline

Prepare text chunks for vector database storage

type: openai_embeddings
api_key: ${env:OPENAI_API_KEY}
model: text-embedding-3-large
input_from: chunks
output_to: vectors
dimensions: 1536
include_usage: true

Configuration

Parameter	Type	Required	Description
`api_key`	`string`	Yes	OpenAI-compatible API key sent as a Bearer token.
`model`	`string`	Yes	Embedding model name (for example 'text-embedding-3-small').
`input_from`	`string`	No	Dot path selecting the text or list of texts to embed. When omitted, the entire event is serialized to JSON.
`input_key`	`string`	No	DEPRECATED: Use 'input_from' instead. Dot path selecting text to embed.
`output_to`	`string`	No	Event key that receives the list of embedding vectors for each input item. Default: `"embeddings"`
`output_key`	`string`	No	DEPRECATED: Use 'output_to' instead. Event key for embedding vectors.
`encoding_format`	`string`	No	Optional encoding format forwarded to the API (for example 'float' or 'base64').
`dimensions`	`string`	No	Requested embedding dimensionality when supported by the provider.
`base_url`	`string`	No	Base API URL for the embeddings endpoint. Override for proxy deployments. Default: `"https://api.openai.com/v1"`
`raw_on_error`	`boolean`	No	When True, store the raw response body under '<output_key>_raw' if JSON parsing fails after a successful request. Default: `true`
`swallow_on_error`	`boolean`	No	If True, skip injecting error details and return the original event on failures. Default: `false`
`extra_headers`	`string`	No	Additional HTTP headers merged into each request alongside the defaults (Authorization, Content-Type, Accept, User-Agent).

Base Configuration

These configuration options are available on all steps:

Parameter	Type	Default	Description
`name`		`null`	Optional name for this step (for documentation and debugging)
`description`		`null`	Optional description of what this step does
`retries`	`integer`	`0`	Number of retry attempts (0-10)
`backoff_seconds`	`number`	`0`	Backoff (seconds) applied between retry attempts
`retry_propagate`	`boolean`	`false`	If True, raise last exception after exhausting retries; otherwise swallow.

← Back to All Steps