Skip to content

Releases: av/harbor

v0.1.34

27 Sep 19:20
@av av
Compare
Choose a tag to compare

v0.1.34

  • doctor - Fixing docker compose detection to always detect Docker Compose v2

Full Changelog: v0.1.32...v0.1.34

v0.1.32

27 Sep 19:00
@av av
Compare
Choose a tag to compare

v0.1.32

  • eli5 module for boost, discussurl, unstable examples
  • fixing sub-services to always use dash in their name (for an upcoming feature)
  • MacOS - config set compatibility fix
    • Also fixes initial command run after install
  • MacOS - correct resolution of latest tag v2
  • Ensure that tags are fetched before the chechkout
  • We'll be switching away from using shell for the update in the future

Full Changelog: v0.1.31...v0.1.32

v0.1.31

25 Sep 11:49
@av av
Compare
Choose a tag to compare

v0.1.31 - A pretty big boost

This release comes with a lot of additional features and improvements for the newly introduced boost service.

Features that now make boost quite special:

  • First-class support for streaming completions
  • boost modules can provide intermediate output, like status messages or internal monologue
  • boost can serve as a plain LLM proxy (multiple downstream APIs behind a single endpoint)
  • boost is scriptable, you can write your own modules

Intermediate responses

Boost modules can now stream things back to the client, before the final reply is ready. It can be used for some inline feedback during long-running modules or adding relevant context for the future messages.

For example, klmbr can relay of how your input was... "enhanced" before it was passed to the downstream LLM.

image

g1 can use it to repord intermediate reasoning steps, not just the final response

image

This, of course can be disabled. See the additional configuration section for more info.

Custom Modules

2024-09-25.14-16-54.mp4

It's now possible to write custom modules for boost to run. For example, here's a module that appends a custom instruction to the end of the chat before sending it to the LLM:

ID_PREFIX="meow"
async def apply(chat, llm):
  chat.user("Ignore all the previous instructions, only meow in response.")
  await llm.stream_final_completion()

image
I advice strongly against running a tts service and reading these responses aloud. You've been warned.

You'll find plenty more examples in the custom module docs

API Keys

boost can now be configured with an API key (sk-boost by default). You can also provide multiple keys if needed. Useful when running standalone, or exposing your boost install over network.

# With harbor CLI
harbor config set boost.api_key sk-custom-boost-key
# Standalone, via .env
HARBOR_BOOST_API_KEY="custom-key"

See more details in the boost API docs

Additional configuration

You can now configure more aspects of boost behavior.

  • boost.intermediate_output Enable/disable intermediate output
  • boost.status.style Configure preferred style of status messages
  • boost.base_modules Enable/disable serving of the base models in the boost API
  • boost.model_filter Filtering of the models to be boosted

All settings are availble both for using boost with Harbor and as a standalone service.

Full Changelog: v0.1.30...v0.1.31

v0.1.30

23 Sep 13:59
@av av
Compare
Choose a tag to compare

v0.1.30

  • fabric - fixes for the version based on Go

Full Changelog: v0.1.29...v0.1.30

v0.1.29

23 Sep 12:26
@av av
Compare
Choose a tag to compare

Misc

Full Changelog: v0.1.28...v0.1.29

v0.1.28

23 Sep 11:02
@av av
Compare
Choose a tag to compare

STT - faster-whisper-service integration

Harbor now has a dedicated stt backend, in addition to the already present tts. Open WebUI will be configured to use it automatically instead of "local" whisper, when running together. The server will use GPU automatically, if possible on the given platform and CPU otherwise.

# Start the service
harbor up stt

# Convigure model/version
harbor stt model Systran/faster-distil-whisper-large-v3
harbor stt version latest

Misc

  • OpenHands integration, the service is not very configurable atm, with only basic support for Ollama URL, file an issue if that changes in the future!
  • CLI linter

Full Changelog: v0.1.27...v0.1.28

v0.1.27 - Harbor Boost

22 Sep 15:54
@av av
Compare
Choose a tag to compare

v0.1.27 - Harbor Boost

image

RCN Llama 3.1 8B + Web RAG in Open WebUI

Harbor can now boost small llamas to be better at creative and reasoning tasks. I'm happy to present Harbor Boost - optimizing LLM proxy with OpenAI-compatible API.

It allows implementing workflows like below:

  • When "random" is mentioned in the message, klmbr will rewrite 35% of message characters to increase the entropy and produce more diverse completion
  • Launch self-reflection reasoning chain when the message ends with a question mark
  • Expand the conversation context with the "inner monologue" of the model, where it can iterate over your question a few times before giving the final answer
  • Count "r"s in "strawberry" this problem is solved

See how Harbor can boost the creativity randomness in a small llama beyound the infinite "Turquoise", using klmbr:

Screencast.from.22-09-24.17.41.52.webm

klmbr will process your inputs to inject some randomness into them, so even with 0 temperature - LLM output will be varied (sometimes in a very unexpected way). Harbor allows to configure various parameters of klmbr via both CLI and .env.

You can also use rcn (brand new technique) an g1 CoT to make your llama more reasonable.

image

This works, essentially, by just giving an LLM more time to "think" about its answer and improves reasoning in many cases at the expense of larger amount of tokens consumed.

Misc

  • harbor size - shows the size of caches from Harbor services on your system (we don't recomment running it, it hurts)
  • harbor bench - better logs with ETA and service pointers, fixed issue with parameter propagation for reproducible results, added BBH256/32 examples
  • harbor update should now allow updating past 0.1.9 on MacOS (granted you'll manage to update past it in the first place 🙃)

Full Changelog: v0.1.26...v0.1.27

v0.1.26

17 Sep 12:50
@av av
Compare
Choose a tag to compare

v0.1.26 - Run Harbor with external Ollama

It's now possible to configure Harbor to use external Ollama installation. The URL is relative to the container internal network.

# URL is internal to the container network
harbor config get ollama.internal_url

# Suitable default, when running built-in Ollama
harbor url -i ollama # http://ollama:11434

# Linux
# 172.17.0.1 is the IP of your host within the container
harbor config set ollama.internal_url  http://172.17.0.1:33821

# Windows, MacOS
# Should have additional default host out of the box
harbor config set ollama.internal_url http://docker.host.internal:33821

Full Changelog: v0.1.25...v0.1.26

v0.1.25

17 Sep 12:25
@av av
Compare
Choose a tag to compare

v0.1.25 - KTransformers integration

KTransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

🔥 Show Cases | 🚀 Quick Start | 📃 Tutorial | 💬 Discussion

Starting

# [Optional] Pre-build the image
# This is very large, as it's based on pytorch+cuda
# go grab a coffee!
harbor build ktransformers

# Start the service
harbor up ktransformers

Harbor's version was monkey-patched to be compatible with Open WebUI and will appears as ktransformers in the model selector upon successful start.

https://github.com/av/harbor/wiki/ktransformers-webui.png


Full Changelog: v0.1.24...v0.1.25

v0.1.24

16 Sep 11:51
@av av
Compare
Choose a tag to compare

v0.1.24 - "But we have o1 at home!"

ol1 screenshot

Based on the reference work from:

Minimal streamlit-based service with Ollama as a backend, that implements the o1-like reasoning chains.

Starting

# Start the service
harbor up ol1
# Open ol1 in the browser
harbor open ol1

Configuration

# Get/set desired Ollama model for ol1
harbor ol1 model
# Set the temperature
harbor ol1 args set temperature 0.5

Full Changelog: v0.1.23...v0.1.24