Embeddings convert text into numbers (vectors) that capture meaning.
Think of it like a GPS coordinate:
- "dog" → [0.2, 0.8, 0.1, ...] (1536 numbers)
- "cat" → [0.3, 0.7, 0.2, ...] (close to "dog"!)
- "car" → [0.9, 0.1, 0.8, ...] (far from "dog")
Similar meanings = Similar vectors!
OpenAI Embeddings
from langchain_openai import OpenAIEmbeddings
# Initialize
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create embedding for a query
query = "What is machine learning?"
query_vector = embeddings.embed_query(query)
print(f"Query: {query}")
print(f"Vector dimensions: {len(query_vector)}")
print(f"First 5 values: {query_vector[:5]}")
# Embed multiple documents
docs = [
"Machine learning is a subset of AI",
"Deep learning uses neural networks",
"The weather is sunny today"
]
doc_vectors = embeddings.embed_documents(docs)
print(f"\nEmbedded {len(doc_vectors)} documents")
Query: What is machine learning?
Vector dimensions: 1536
First 5 values: [-0.002476818859577179, -0.012755980715155602, -0.006645360495895147, -0.03157883137464523, 0.028759293258190155]
Embedded 3 documents
Google Gemini Embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings
Can get the updated top embedding models from here https://huggingface.co/spaces/mteb/leaderboard
the Unknown ones are non-open source
No comments:
Post a Comment