Examples how to configure models
Configuring models consists of two parts - model weight download and worker deployment. Here we want to provide examples how to configure the two parts for particular models. Make sure to specify tolerations according to your node configuration if required.
Pharia-1 with 256-dimensional embedding head
We download both the base model and adapter to the same volume:
models:
- name: models-pharia-1-embedding-256-control
pvcSize: 20Gi
weights:
- repository:
fileName: Pharia-1-Embedding-256-control.tar
targetDirectory: pharia-1-embedding-256-control
- repository:
fileName: Pharia-1-Embedding-256-control-adapter.tar
targetDirectory: pharia-1-embedding-256-control-adapter
The worker checkpoint exposes the embedding adapter for 256-dimensional embeddings:
checkpoints:
- generator:
type: luminous
tokenizer_path: pharia-1-embedding-256-control/vocab.json
pipeline_parallel_size: 1
tensor_parallel_size: 1
weight_set_directories:
- pharia-1-embedding-256-control
- pharia-1-embedding-256-control-adapter
cuda_graph_caching: true
memory_safety_margin: 0.1
task_returning: true
queue: pharia-1-embedding-256-control
tags: []
replicas: 1
version: 0
modelVolumeClaim: models-pharia-1-embedding-256-control
models:
pharia-1-embedding-256-control:
experimental: false
multimodal_enabled: false
completion_type: none
embedding_type: instructable
maximum_completion_tokens: 0
adapter_name: embed-256
bias_name: null
softprompt_name: null
description: Pharia-1-Embedding-256-control. Fine-tuned for instructable embeddings. Has an extra down projection layer to provide 256-dimensional embeddings.
aligned: false
chat_template: null
worker_type: luminous
prompt_template: |-
{% promptrange instruction %}{{instruction}}{% endpromptrange %}
{% if input %}
{% promptrange input %}{{input}}{% endpromptrange %}
{% endif %}
embedding_head: pooling_only