Astra DB Vector Store
This page provides a quickstart for using Astra DB as a Vector Store.
DataStax Astra DB is a serverless vector-capable database built on Apache Cassandraยฎ and made conveniently available through an easy-to-use JSON API.
Setupโ
Use of the integration requires the langchain-astradb
partner package:
pip install -qU "langchain-astradb>=0.3.3"
Credentialsโ
In order to use the AstraDB vector store, you must first head to the AstraDB website, create an account, and then create a new database - the initialization might take a few minutes.
Once the database has been initialized, you should create an application token and save it for later use.
You will also want to copy the API Endpoint
from the Database Details
and store that in the ASTRA_DB_API_ENDPOINT
variable.
You may optionally provide a namespace, which you can manage from the Data Explorer
tab of your database dashboard. If you don't wish to set a namespace, you can leave the getpass
prompt for ASTRA_DB_NAMESPACE
empty.
import getpass
ASTRA_DB_API_ENDPOINT = getpass.getpass("ASTRA_DB_API_ENDPOINT = ")
ASTRA_DB_APPLICATION_TOKEN = getpass.getpass("ASTRA_DB_APPLICATION_TOKEN = ")
desired_namespace = getpass.getpass("ASTRA_DB_NAMESPACE = ")
if desired_namespace:
ASTRA_DB_NAMESPACE = desired_namespace
else:
ASTRA_DB_NAMESPACE = None
If you want to get best in-class automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Initializationโ
There are two ways to create an Astra DB vector store, which differ in how the embeddings are computed.
Method 1: Explicit embeddingsโ
You can separately instantiate a langchain_core.embeddings.Embeddings
class and pass it to the AstraDBVectorStore
constructor, just like with most other LangChain vector stores.
Method 2: Integrated embedding computationโ
Alternatively, you can use the Vectorize feature of Astra DB and simply specify the name of a supported embedding model when creating the store. The embedding computations are entirely handled within the database. (To proceed with this method, you must have enabled the desired embedding integration for your database, as described in the docs.)
Explicit Embedding Initializationโ
Below, we instantiate our vector store using the explicit embedding class:
- OpenAI
- HuggingFace
- Fake Embedding
pip install -qU langchain-openai
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
pip install -qU langchain-huggingface
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model="sentence-transformers/all-mpnet-base-v2")
pip install -qU langchain-core
from langchain_core.embeddings import FakeEmbeddings
embeddings = FakeEmbeddings(size=4096)
from langchain_astradb import AstraDBVectorStore
vector_store = AstraDBVectorStore(
collection_name="astra_vector_langchain",
embedding=embeddings,
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
namespace=ASTRA_DB_NAMESPACE,
)
Integrated Embedding Initializationโ
Here it is assumed that you have
- Enabled the OpenAI integration in your Astra DB organization,
- Added an API Key named
"OPENAI_API_KEY"
to the integration, and scoped it to the database you are using.
For more details on how to do this, please consult the documentation.
from astrapy.info import CollectionVectorServiceOptions
openai_vectorize_options = CollectionVectorServiceOptions(
provider="openai",
model_name="text-embedding-3-small",
authentication={
"providerKey": "OPENAI_API_KEY",
},
)
vector_store_integrated = AstraDBVectorStore(
collection_name="astra_vector_langchain_integrated",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
namespace=ASTRA_DB_NAMESPACE,
collection_vector_service_options=openai_vectorize_options,
)