Qdrant
Module: implementation.datasources.db
QdrantCreateCollection
Create empty collection with given parameters. Subclass of Action. Type of Action[Dict[str, Any], bool]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name": Name of the collection to create.
"vectors_config": Configuration of the vector storage. Vector params contains size and distance for the vector storage. If dict is passed, service will create a vector storage for each key in the dict. If single VectorParams is passed, service will create a single anonymous vector storage.
"sparse_vectors_config": Configuration of the sparse vector storage. The service will create a sparse vector storage for each key in the dict.
"shard_number": Number of shards in collection. Default is 1, minimum is 1.
"sharding_method": Defines strategy for shard creation. Option auto (default) creates defined number of shards automatically. Data will be distributed between shards automatically. After creation, shards could be additionally replicated, but new shards could not be created. Option custom allows to create shards manually, each shard should be created with assigned unique shard_key. Data will be distributed between based on shard_key value.
"replication_factor": Replication factor for collection. Default is 1, minimum is 1. Defines how many copies of each shard will be created. Have effect only in distributed mode.
"write_consistency_factor": Write consistency factor for collection. Default is 1, minimum is 1. Defines how many replicas should apply the operation for us to consider it successful. Increasing this number will make the collection more resilient to inconsistencies, but will also make it fail if not enough replicas are available. Does not have any performance impact. Have effect only in distributed mode.
"on_disk_payload": If true - point`s payload will not be stored in memory. It will be read from the disk every time it is requested. This setting saves RAM by (slightly) increasing the response time. Note: those payload values that are involved in filtering and are indexed - remain in RAM.
"hnsw_config": Params for HNSW index optimizers_config: Params for optimizer wal_config: Params for Write-Ahead-Log quantization_config: Params for quantization, if None - quantization will be disabled init_from: Use data stored in another collection to initialize this collection timeout: Wait for operation commit timeout in seconds. If timeout is reached - request will return with service error.
Returns:
bool: Operation result.
QdrantRecreateCollection
Delete and create empty collection with given parameters. Subclass of Action. Type of Action[Dict[str, Any], bool]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name": Name of the collection to recreate.
"vectors_config": Configuration of the vector storage. Vector params contains size and distance for the vector storage. If dict is passed, service will create a vector storage for each key in the dict. If single VectorParams is passed, service will create a single anonymous vector storage.
"sparse_vectors_config": Configuration of the sparse vector storage. The service will create a sparse vector storage for each key in the dict.
"shard_number": Number of shards in collection. Default is 1, minimum is 1.
"sharding_method": Defines strategy for shard creation. Option auto (default) creates defined number of shards automatically. Data will be distributed between shards automatically. After creation, shards could be additionally replicated, but new shards could not be created. Option custom allows to create shards manually, each shard should be created with assigned unique shard_key. Data will be distributed between based on shard_key value.
"replication_factor": Replication factor for collection. Default is 1, minimum is 1. Defines how many copies of each shard will be created. Have effect only in distributed mode.
"write_consistency_factor": Write consistency factor for collection. Default is 1, minimum is 1. Defines how many replicas should apply the operation for us to consider it successful. Increasing this number will make the collection more resilient to inconsistencies, but will also make it fail if not enough replicas are available. Does not have any performance impact. Have effect only in distributed mode.
"on_disk_payload": If true - point`s payload will not be stored in memory. It will be read from the disk every time it is requested. This setting saves RAM by (slightly) increasing the response time. Note: those payload values that are involved in filtering and are indexed - remain in RAM.
"hnsw_config": Params for HNSW index optimizers_config: Params for optimizer wal_config: Params for Write-Ahead-Log quantization_config: Params for quantization, if None - quantization will be disabled init_from: Use data stored in another collection to initialize this collection timeout: Wait for operation commit timeout in seconds. If timeout is reached - request will return with service error.
Returns:
bool: Operation result.
QdrantDeleteCollection
Remove collection and all it's data. Subclass of Action. Type of Action[Dict[str, Any], bool]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name": Name of the collection to recreate.
"timeout": Wait for operation commit timeout in seconds. If timeout is reached - request will return with service error.
"kwargs": extra parameters.
Returns:
bool: Operation result.
QdrantAdd
Adds text documents into qdrant collection. If collection does not exist, it will be created with default parameters. Metadata in combination with documents will be added as payload. Documents will be embedded using the specified embedding model. Subclass of Action. Type of Action[Dict[str, Any], List[Union[str, int]]]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name" (str): Name of the collection to add documents to.
"documents" (Iterable[str]): List of documents to embed and add to the collection.
"metadata" (Iterable[Dict[str, Any]], optional): List of metadata dicts. Defaults to None.
"ids" (Iterable[models.ExtendedPointId], optional): List of ids to assign to documents. If not specified, UUIDs will be generated. Defaults to None.
"batch_size" (int, optional): How many documents to embed and upload in single request. Defaults to 32.
"parallel" (Optional[int], optional): How many parallel workers to use for embedding. Defaults to None. If number is specified, data-parallel process will be used.
Returns:
List[Union[str, int]]: List of IDs of added documents. If no ids provided, UUIDs will be randomly generated on client side.
QdrantUpsert
Update or insert a new point into the collection. If point with given ID already exists - it will be overwritten. Subclass of Action. Type of Action[Dict[str, Any], UpdateResult]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name" (str): To which collection to insert.
"points" (Point): Batch or list of points to insert.
"wait" (bool): Await for the results to be processed. - If True, result will be returned only when all changes are applied - If False, result will be returned immediately after the confirmation of receiving.
"ordering" (Optional[WriteOrdering]): Define strategy for ordering of the points. Possible values: - `weak` (default) - write operations may be reordered, works faster - `medium` - write operations go through dynamically selected leader, may be inconsistent for a short period of time in case of leader change - `strong` - Write operations go through the permanent leader, consistent, but may be unavailable if leader is down
"shard_key_selector": Defines the shard groups that should be used to write updates into. If multiple shard_keys are provided, the update will be written to each of them. Only works for collections with custom sharding method.
Returns:
UpdateResult: Operation result.
QdrantDelete
Update or insert a new point into the collection. If point with given ID already exists - it will be overwritten. Subclass of Action. Type of Action[Dict[str, Any], UpdateResult]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name" (str): To which collection to insert.
"points_selector": Selects points based on list of IDs or filter. Examples: -
points=[1, 2, 3, "cd3b53f0-11a7-449f-bc50-d06310e7ed90"]
-points=Filter(must=[FieldCondition(key='rand_number', range=Range(gte=0.7))])
"wait" (bool): Await for the results to be processed. - If True, result will be returned only when all changes are applied - If False, result will be returned immediately after the confirmation of receiving.
"ordering" (Optional[WriteOrdering]): Define strategy for ordering of the points. Possible values: - `weak` (default) - write operations may be reordered, works faster - `medium` - write operations go through dynamically selected leader, may be inconsistent for a short period of time in case of leader change - `strong` - Write operations go through the permanent leader, consistent, but may be unavailable if leader is down
"shard_key_selector": Defines the shard groups that should be used to write updates into. If multiple shard_keys are provided, the update will be written to each of them. Only works for collections with custom sharding method.
Returns:
UpdateResult: Operation result.
QdrantQuery
Search for documents in a collection. This method automatically embeds the query text using the specified embedding model. If you want to use your own query vector, use search method instead. Subclass of Action. Type of Action[Dict[str, Any], List[QueryResponse]]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name": Collection to search in.
"query_text": Text to search for. This text will be embedded using the specified embedding model. And then used as a query vector.
"query_filter": Exclude vectors which doesn't fit given conditions. If None - search among all vectors.
"limit": How many results return.
"kwargs": Additional search parameters.
Returns:
List[QueryResponse]: List of scored points.
QdrantSearch
Search for closest vectors in collection taking into account filtering conditions. Subclass of Action. Type of Action[Dict[str, Any], List[ScoredPoint]]
__init__
Arguments:
client (QdrantClient): Qdrant client to use
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
default_key (str, optional): Default key used for results that is not of type Dict. Defaults to "output".
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"collection_name": Collection to search in.
"query_vector": Search for vectors closest to this. Can be either a vector itself, or a named vector, or a named sparse vector, or a tuple of vector name and vector itself.
"query_filter": Exclude vectors which doesn't fit given conditions. If None - search among all vectors.
"search_params": Additional search params.
"limit": How many results return.
"offset": Offset of the first result to return. May be used to paginate results. Note: large offset values may cause performance issues.
"with_payload": Specify which stored payload should be attached to the result. If True - attach all payload. If False - do not attach any payload. If List of string - include only specified fields. If PayloadSelector - use explicit rules.
"with_vectors": If True - Attach stored vector to the search result. If False - Do not attach vector. If List of string - include only specified fields. Defaults to False.
"score_threshold": Define a minimal score threshold for the result. If defined, less similar results will not be returned. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.
"consistency": Read consistency of the search. Defines how many replicas should be queried before returning the result. Values: - int - number of replicas to query, values should present in all queried replicas - 'majority' - query all replicas, but return values present in the majority of replicas - 'quorum' - query the majority of replicas, return values present in all of them - 'all' - query all replicas, and return values present in all replicas
"shard_key_selector": This parameter allows to specify which shards should be queried. If None - query all shards. Only works for collections with custom sharding method.
"timeout": Overrides global timeout for this search. Unit is seconds.
Returns:
List[ScoredPoint]: List of found close points with similarity scores.
Last updated