Upsert (Insert/Update)
Most vector stores use upsert — insert if new, update if the ID already exists. This is the primary way to add data. Batch upserts (hundreds or thousands at once) are much faster than individual inserts.
Delete
Delete by ID or by metadata filter. When a source document changes, delete all chunks from the old version and upsert the new chunks. Some stores support namespaces or collections to isolate different datasets.
Re-indexing
If you change your embedding model or chunking strategy, you must re-embed and re-index everything. Vectors from different models are incompatible — you cannot mix them in the same index. Plan for this by keeping your original text accessible.
# Chroma — full workflow
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_docs")
# Add (upsert)
collection.add(
ids=["chunk_1", "chunk_2"],
embeddings=[vec1, vec2],
documents=["text 1", "text 2"],
metadatas=[
{"source": "doc.pdf", "page": 1},
{"source": "doc.pdf", "page": 2}
]
)
# Query
results = collection.query(
query_embeddings=[query_vec],
n_results=5
)
# Delete
collection.delete(ids=["chunk_1"])
Always store the original text alongside the vector. You need it for the LLM prompt (the generation step), for debugging retrieval, and for re-indexing if you change models. Some stores (Chroma, Weaviate) store text natively; others (FAISS) require a separate text store.