Examples how to configure models

Configuring models consists of two parts - model weight download and worker deployment. Here we want to provide examples how to configure the two parts for particular models. Make sure to specify tolerations according to your node configuration if required.

Pharia-1 with 256-dimensional embedding head

We download both the base model and adapter to the same volume:

models:
  - name: models-pharia-1-embedding-256-control
    pvcSize: 20Gi
    weights:
      - repository:
          fileName: Pharia-1-Embedding-256-control.tar
          targetDirectory: pharia-1-embedding-256-control
      - repository:
          fileName: Pharia-1-Embedding-256-control-adapter.tar
          targetDirectory: pharia-1-embedding-256-control-adapter

The worker checkpoint exposes the embedding adapter for 256-dimensional embeddings:

checkpoints:
- generator:
    type: luminous
    tokenizer_path: pharia-1-embedding-256-control/vocab.json
    pipeline_parallel_size: 1
    tensor_parallel_size: 1
    weight_set_directories:
    - pharia-1-embedding-256-control
    - pharia-1-embedding-256-control-adapter
    cuda_graph_caching: true
    memory_safety_margin: 0.1
    task_returning: true
  queue: pharia-1-embedding-256-control
  tags: []
  replicas: 1
  version: 0
  modelVolumeClaim: models-pharia-1-embedding-256-control
  models:
    pharia-1-embedding-256-control:
      experimental: false
      multimodal_enabled: false
      completion_type: none
      embedding_type: instructable
      maximum_completion_tokens: 0
      adapter_name: embed-256
      bias_name: null
      softprompt_name: null
      description: Pharia-1-Embedding-256-control. Fine-tuned for instructable embeddings. Has an extra down projection layer to provide 256-dimensional embeddings.
      aligned: false
      chat_template: null
      worker_type: luminous
      prompt_template: |-
        {% promptrange instruction %}{{instruction}}{% endpromptrange %}
        {% if input %}
        {% promptrange input %}{{input}}{% endpromptrange %}
        {% endif %}
      embedding_head: pooling_only

Pharia-1 with 256-dimensional embedding head​

Pharia-1 with 256-dimensional embedding head