Network of Things: 04/01/2024

Saturday, April 20, 2024

Llama 3 - More ways to run it, but still nothing new

Llama 3 is out and getting to it can be a challenge. The approval email's URL expires in 24 hours. It can take 8hrs to download. But after the download from Meta, it can be use locally in text-generation-webui. This time it has hosted versions on hugging chat and meta itself. It says it's training stopped in 2021 so it continues to think the PM of UK is Boris. But it believes it is more conversational.

When asked how many params it is trained on, it initially said 1.5B. Then I asked again and it changed its mind.

Using ollama to run llama-3, I get better answers

On text-generation-webui, the model does not load except when you pick transformers as the loader. And the chat is not fully functional.

After converting to GGUF,

LM Studio is the best one of these for now.

Thursday, April 04, 2024

LLM - Not everything can be learned - so let's realign it to our preferences

When I first started researching LLMs it seemed like the technology could simply learn and get to a point where it is self-learning artificial lifeform (AGI). Now 8 months since my last post, it looks like the initial trust of teaching an LLM everything is not giving the returns that researchers thought. Words originally used such as "emergent behavior" are now being replaced with "hallucinations", "catatrophic degradation".

The jack of all LLM is not what we really wanted, what we want is precise control over the completions (answers). To get there, we are now seeing new aveneues of research collectively called fine-tuning. Fine-tuning is not a performance run-time effort, rather, it is changing the model's weights to reflect preferences. A new alphabet soup of acronyms called DPO, IPO, KTO are all optimizations that introduce new labeled datasets and under supervision get a generic pre-trained model to answer the "money questions".

If you have been exposed to ML/AI for long, you already know we have seen this before and then it was called "reinforcement learning". Today they add a HF (human feedback to it) and it is now called RLHF. Once again, we are back to using likelihoods (read probabilities) and rewards (biases) to get an AI to spit out answers which can add economic value.

Network of Things

Saturday, April 20, 2024

Llama 3 - More ways to run it, but still nothing new

Thursday, April 04, 2024

LLM - Not everything can be learned - so let's realign it to our preferences

DRS1 = DSV3 + GRPO + VR

Search This Blog