LlmService

Trait Definition

#[async_trait]
pub trait LlmService: Send + Sync {
    async fn chat(&self, request: ChatRequest) -> Result<ChatResponse>;
    async fn generate(&self, prompt: &str, config: GenerationConfig) -> Result<String>;
    async fn health_check(&self) -> Result<()>;
}

Module: magicaf_core::llm

Methods

`chat`

async fn chat(&self, request: ChatRequest) -> Result<ChatResponse>

Send a structured chat completion request.

Parameter	Type	Description
`request`	`ChatRequest`	The chat completion request payload

Returns: ChatResponse containing the model’s reply, usage statistics, and metadata.

Errors: MagicError::LlmError, MagicError::HttpError, MagicError::SerializationError

`generate`

async fn generate(&self, prompt: &str, config: GenerationConfig) -> Result<String>

High-level convenience: turn a raw prompt into generated text.

Parameter	Type	Description
`prompt`	`&str`	The input prompt
`config`	`GenerationConfig`	Generation parameters

Returns: String — the generated text.

`health_check`

async fn health_check(&self) -> Result<()>

Verify the upstream LLM server is reachable.

DTO Types

`ChatRole`

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum ChatRole {
    System,
    User,
    Assistant,
    Other,
}

`ChatMessage`

pub struct ChatMessage {
    pub role: ChatRole,
    pub content: String,
}

Convenience constructors:

ChatMessage::system("You are a helpful assistant.")
ChatMessage::user("What is MagicAF?")
ChatMessage::assistant("MagicAF is...")

`ChatRequest`

pub struct ChatRequest {
    pub model: String,
    pub messages: Vec<ChatMessage>,
    pub temperature: Option<f32>,
    pub top_p: Option<f32>,
    pub max_tokens: Option<u32>,
    pub stop: Option<Vec<String>>,
}

Follows the OpenAI /v1/chat/completions request schema. The model field is auto-filled by LocalLlmService if left empty.

`ChatResponse`

pub struct ChatResponse {
    pub id: String,
    pub object: String,
    pub created: u64,
    pub model: String,
    pub choices: Vec<ChatChoice>,
    pub usage: Option<Usage>,
}

Helper method:

// Extract the text of the first choice
let text: Option<&str> = response.first_content();

`ChatChoice`

pub struct ChatChoice {
    pub index: u32,
    pub message: ChatMessage,
    pub finish_reason: Option<String>,
}

`Usage`

pub struct Usage {
    pub prompt_tokens: u32,
    pub completion_tokens: u32,
    pub total_tokens: u32,
}

`LocalLlmService`

Crate: magicaf-local-llm

HTTP client that calls any server exposing an OpenAI-compatible /v1/chat/completions endpoint.

Constructor

impl LocalLlmService {
    pub fn new(config: LlmConfig) -> Result<Self>
    pub fn model_name(&self) -> &str
}

Compatible Servers

Server	Command / Notes
vLLM	`python -m vllm.entrypoints.openai.api_server --model ... --port 8000`
llama.cpp	`./llama-server -m model.gguf --port 8000`
TGI	Text Generation Inference with OpenAI-compatible mode
LocalAI	Drop-in OpenAI replacement
Ollama	With OpenAI compatibility layer enabled
Custom	Any HTTP server mirroring the OpenAI Chat Completions schema

Example

use magicaf_core::config::LlmConfig;
use magicaf_local_llm::LocalLlmService;

let llm = LocalLlmService::new(LlmConfig {
    base_url: "http://localhost:8000/v1".into(),
    model_name: "mistral-7b".into(),
    api_key: None,
    timeout_secs: 120,
})?;

// Structured chat
let response = llm.chat(ChatRequest {
    model: String::new(), // Auto-filled from config
    messages: vec![
        ChatMessage::system("You are helpful."),
        ChatMessage::user("Explain RAG."),
    ],
    temperature: Some(0.3),
    top_p: None,
    max_tokens: Some(1024),
    stop: None,
}).await?;

println!("{}", response.first_content().unwrap_or(""));

// Simple generation
let text = llm.generate("Explain RAG in one sentence.", GenerationConfig::default()).await?;

Trait Definition#

Methods#

chat#

generate#

health_check#

DTO Types#

ChatRole#

ChatMessage#

ChatRequest#

ChatResponse#

ChatChoice#

Usage#

LocalLlmService#

Constructor#

Compatible Servers#

Example#