Managing Collections

Organize and group vectors into unique groups

Collections serve as a fundamental organizational tool in vector storage, allowing you to group vectors by both index and dimension. It's important to bear in mind that once a collection's name and dimension are established, they cannot be altered.

To effectively harness the power of vector storage, start with creating a collection in the UI or via the SDK.


Creating collections

Creating a new collection is done with the following simple command:

from qwak.vector_store import VectorStoreClient

client = VectorStoreClient()
catalog_collection = client.create_collection(name="product_catalog",
                                              description="Indexing a product catalog of fashion items",
                                              dimension=384,
                                              metric="cosine",
                                              vectorizer="<optional>",
                                             	multi_tenant=<True/False>)

Name

The collection name must be unique across your account.

Dimension

This parameter determines the length of vectors supported within this collection. Please note that the dimension size remains immutable once the collection is created.

Metric type

Choose the index type that aligns with your specific search and retrieval requirements to maximize the effectiveness of your vector-based operations:

  • Cosine similarity - ideal for semantic and conceptual similarity searches
  • L2 distance - best for geometric distance calculations involving both direction and magnitude.

Vectorizer

A vectorizer is an embeddings model deployed on JFrog ML, and is an optional feature of the collection.

🚧

To use a vectorizer, you have to deploy an embedding models as a real-time model.

Use this example model or make sure that your model supports the following structure to automatically use the vector store conversion abilities.

# The model receives a dataframe with a single key named "input"
input_ = DataFrame(
  [{
    "input": "Why does it matter if a Central Bank has a negative rather than 0% interest rate?"
  }]
).to_json()
# The model returns a list of objects with a single key named "embeddings"
[
    {
        "embeddings": [
            0.0517567955,
            0.0195365045
        ]
    }
]

Multi Tenancy

JFrog ML Vector Store supports multi-tenancy at the collection level, meaning you can store data of multiple business entities (i.e., tenants) in a single collection, yet have a complete logical separation (namespacing) when upserting, searching and deleting in an efficient manner, without relying on filters.

📘

When would you use multi-tenancy?

In scenarios where optimizing resource efficiency and minimizing latency are key concerns, the adoption of multi-tenancy in our vector store can be beneficial. By sharing a common infrastructure among multiple tenants, we enhance resource utilization, significantly reducing latency when fetching data.

Multi-tenancy is recommended when the data can be split into disjoined sets, with each set representing a distinct tenant. In such cases, any business question asked is always within the scope of a single set, enabling faster search times.

Additionally, for users with stringent data isolation requirements, the multi-tenancy model ensures that each tenant's data remains securely and logically separated, preserving privacy and preventing unintended access or interference.

To enable multi tenancy, add multi_tenant=True to the collection creation call (defaults to False)

❗️

Note: Once created, multi tenancy cannot be disabled or enabled for a collection, thus enabling/disabling multi tenancy requires re-creating the collection.

Deleting collections

Deleting a collection will delete all the data and vectors associated with the collection. Deleting a collection is irreversible, and deleted data cannot be restored.

Deleting a collection is done by using the collection name, which is unique across the account.

from qwak.vector_store import VectorStoreClient

client = VectorStoreClient()
client.delete_collection(name="product_catalog")

Getting all collections

To get a list of all the collections in your account, use the following code snippets:

from qwak.vector_store import VectorStoreClient

client = VectorStoreClient()
collections = client.list_collections()