openai_embeddings
Generate vector embeddings for text using OpenAI's embedding models.
Overview
Generate vector embeddings for text using OpenAI's embedding models. This step converts text into numerical vector representations (embeddings) that capture semantic meaning. These vectors enable similarity search, clustering, recommendations, and RAG (Retrieval Augmented Generation) systems. OpenAI's text-embedding-3-small and text-embedding-3-large models provide state-of-the-art embeddings with configurable dimensions for optimizing quality vs storage. You can embed single texts or batch process arrays of text chunks. The resulting vectors can be stored in vector databases (Pinecone, Weaviate, etc.) for semantic search applications.
Quick Start
steps:
- type: openai_embeddings
api_key: sk-live-123
model: text-embedding-3-smallConfiguration
| Parameter | Type | Required | Description |
|---|---|---|---|
api_key | string | Yes | OpenAI-compatible API key sent as a Bearer token. |
model | string | Yes | Embedding model name (for example 'text-embedding-3-small'). |
input_from | string | No | Dot path selecting the text or list of texts to embed. When omitted, the entire event is serialized to JSON. |
input_key | string | No | DEPRECATED: Use 'input_from' instead. Dot path selecting text to embed. |
output_to | string | No | Event key that receives the list of embedding vectors for each input item.
Default: "embeddings" |
output_key | string | No | DEPRECATED: Use 'output_to' instead. Event key for embedding vectors. |
encoding_format | string | No | Optional encoding format forwarded to the API (for example 'float' or 'base64'). |
dimensions | string | No | Requested embedding dimensionality when supported by the provider. |
base_url | string | No | Base API URL for the embeddings endpoint. Override for proxy deployments.
Default: "https://api.openai.com/v1" |
raw_on_error | boolean | No | When True, store the raw response body under '<output_key>_raw' if JSON parsing fails after a successful request.
Default: true |
swallow_on_error | boolean | No | If True, skip injecting error details and return the original event on failures.
Default: false |
extra_headers | string | No | Additional HTTP headers merged into each request alongside the defaults (Authorization, Content-Type, Accept, User-Agent). |
Examples
Basic semantic search embeddings
Generate embeddings for text using the efficient small model
type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-small
input_from: document.text
output_to: document.embedding
High-quality embeddings with custom dimensions
Use the large model with reduced dimensions for storage efficiency
type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-large
input_from: article.content
output_to: article.vector
dimensions: 1024
encoding_format: float
Batch embed document chunks
Generate embeddings for multiple text chunks at once
type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-small
input_from: document.chunks
output_to: document.chunk_embeddings
Embeddings for RAG pipeline
Prepare text chunks for vector database storage
type: openai_embeddings
api_key: ${env:openai_api_key}
model: text-embedding-3-large
input_from: chunks
output_to: vectors
dimensions: 1536
include_usage: true
Advanced Options
These options are available on all steps for error handling and retry logic:
| Parameter | Type | Default | Description |
|---|---|---|---|
retries | integer | 0 | Number of retry attempts (0-10) |
backoff_seconds | number | 0 | Backoff (seconds) applied between retry attempts |
retry_propagate | boolean | false | If True, raise last exception after exhausting retries; otherwise swallow. |