Hugging Face Hands On: Unlock the Power of OLLAM and Hugging Face

https://www.youtube.com/watch?v=LQJVz-B_mZI&t=32s

https://www.youtube.com/watch?v=jK_PZqeQ7BE

Simple Steps

anaconda.com download for your os
conda create -n hf_1 python 3.11
conda activate hf_1
pip install -U "huggingface_hub[cli]"
pip install huggingface_hub[hf_transfer]
add variable HF_HUB_ENABLE_HF_TRANSFER=1 to env : windows-properties-advanced settings-environment variables - new - Variable ...HF ok
Huggingface - Models- GGUF - Search for TheBlock - copy model name with GUF
huggingface-cli download TheBloke/MistralLite-7B-GGUF mistrallite.Q4_K_M.gguf --local-dir downloads --local-dir-use-symlinks False
Notepad - create Modelfile .> FROM downloads\mistrallite.Q4_K_M.gguf
ollama create aurmc -f Modelfile
ollama list
ollama run aurmc (model requires more system memory (6.0 GiB) than is available (4.4 GiB) otherwise Error:)

<or>

Try

OLLAMA run hf.co/arcee.ai/SuperNova-Medius-GGUF

Glossary:

Here are 10 important glossary terms for both LLMs (Large Language Models) and Hugging Face:

Glossary Terms for LLM (Large Language Models):

Transformer:
A neural network architecture designed to process sequential data using self-attention mechanisms, powering most modern LLMs.
Self-Attention:
A mechanism that allows models to focus on relevant parts of the input sequence while processing it, crucial for understanding context.
Pre-training:
The process of training an LLM on a large dataset to learn general language representations before fine-tuning on specific tasks.
Fine-tuning:
Adapting a pre-trained model to perform specific tasks by training it further on a smaller, task-specific dataset.
Tokenization:
The process of breaking text into smaller units (tokens) like words, subwords, or characters, which are fed into the model.
Context Window:
The maximum sequence length or number of tokens that a model can process at once.
Zero-shot Learning:
The ability of an LLM to perform tasks it hasn't explicitly been trained on by leveraging its general understanding of language.
Few-shot Learning:
Using a few examples to prompt an LLM to perform a task without extensive retraining or fine-tuning.
Generative Pre-trained Transformer (GPT):
A family of transformer-based LLMs designed for generating text, with models like GPT-3 and GPT-4.
Inference:
The process of using a trained model to make predictions or generate outputs from new inputs.

Glossary Terms for Hugging Face:

Pipeline:
A high-level API in Hugging Face that simplifies access to pre-trained models for various tasks like sentiment analysis, translation, etc.
Model Hub:
An online repository by Hugging Face where users can find and share pre-trained models for different tasks.
Transformers Library:
Hugging Face’s open-source library providing tools for working with transformer models.
Datasets:
A library by Hugging Face for loading, preprocessing, and working with datasets for machine learning.
Tokenizer:
A Hugging Face module that converts text into tokens that models can process.
Encoder-Decoder Architecture:
A framework used in models like T5 and BART, where the encoder processes input and the decoder generates output.
AutoModel:
A class in Hugging Face that automatically selects the appropriate model architecture based on a pre-trained model name.
Trainer:
A utility in Hugging Face that simplifies training and fine-tuning of transformer models.
Attention Mask:
A mechanism used to specify which parts of an input sequence should be ignored (e.g., padding tokens).
Hugging Face Hub:
A platform for hosting and collaborating on machine learning models, datasets, and pipelines.

Drill Baby Drill #1

Model download and Save locally for future use as LLM (without internet) in 2 parts.

#pip torch tensor flash transformer

# pip list --> to check transformer is ther or not!

First download and save in local disk:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name= "facebook/blenderbot-400M-distill"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained("./tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model.save_pretrained("./model")

Second use from local disk and use as local LLM

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

import streamlit as st

tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")

def chatbot_response(user_input):
    inputs = tokenizer([user_input], return_tensors="pt")
    reply_ids = model.generate(**inputs)
    return tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0]

while True:
    user_input = input("Enter for response:")
    if user_input == "Exit":
       break
    print(chatbot_response(user_input))

Response:

"""Response:
Enter for response:How are you ?
 I'm doing well, thank you. How about yourself? Do you have any plans for the weekend?
 """

Drill baby Drill #2

Inference.py using HF KEY

#pip install huggingface_cli
#export HF_TOKEN="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"


from huggingface_hub import InferenceClient
import json

repo_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

llm_client = InferenceClient(
    model=repo_id,
    timeout=120,
)

def call_llm(inference_client: InferenceClient, prompt: str):
    response = inference_client.post(
        json={
            "inputs": prompt,
            "parameters": {"max_new_tokens": 200},
            "task": "text-generation",
        },
    )

    return json.loads(response.decode())[0]["generated_text"]

response=call_llm(llm_client, "write me a crazy joke")
print (response)

Hugging Face Hands On

Thursday, 23 January 2025

Unlock the Power of OLLAM and Hugging Face

Glossary Terms for LLM (Large Language Models):

Glossary Terms for Hugging Face:

No comments:

Post a Comment

Unlock the Power of OLLAM and Hugging Face

Report Abuse