Thursday, 23 January 2025

Unlock the Power of OLLAM and Hugging Face

https://www.youtube.com/watch?v=LQJVz-B_mZI&t=32s

https://www.youtube.com/watch?v=jK_PZqeQ7BE

Simple Steps

  1. anaconda.com download for your os
  2. conda create -n hf_1 python 3.11
  3. conda activate hf_1
  4. pip install -U  "huggingface_hub[cli]"
  5. pip install huggingface_hub[hf_transfer]
  6. add variable HF_HUB_ENABLE_HF_TRANSFER=1 to env : windows-properties-advanced settings-environment variables - new - Variable ...HF ok
  7. Huggingface - Models- GGUF - Search for TheBlock - copy model name with GUF
  8. huggingface-cli download TheBloke/MistralLite-7B-GGUF mistrallite.Q4_K_M.gguf --local-dir downloads --local-dir-use-symlinks False
  9. Notepad - create Modelfile .>  FROM downloads\mistrallite.Q4_K_M.gguf
  10. ollama create aurmc -f Modelfile
  11. ollama list
  12. ollama run aurmc (model requires more system memory (6.0 GiB) than is available (4.4 GiB) otherwise Error:)
<or>

Try 
OLLAMA run hf.co/arcee.ai/SuperNova-Medius-GGUF

Glossary:

Here are 10 important glossary terms for both LLMs (Large Language Models) and Hugging Face:


Glossary Terms for LLM (Large Language Models):

  1. Transformer:
    A neural network architecture designed to process sequential data using self-attention mechanisms, powering most modern LLMs.

  2. Self-Attention:
    A mechanism that allows models to focus on relevant parts of the input sequence while processing it, crucial for understanding context.

  3. Pre-training:
    The process of training an LLM on a large dataset to learn general language representations before fine-tuning on specific tasks.

  4. Fine-tuning:
    Adapting a pre-trained model to perform specific tasks by training it further on a smaller, task-specific dataset.

  5. Tokenization:
    The process of breaking text into smaller units (tokens) like words, subwords, or characters, which are fed into the model.

  6. Context Window:
    The maximum sequence length or number of tokens that a model can process at once.

  7. Zero-shot Learning:
    The ability of an LLM to perform tasks it hasn't explicitly been trained on by leveraging its general understanding of language.

  8. Few-shot Learning:
    Using a few examples to prompt an LLM to perform a task without extensive retraining or fine-tuning.

  9. Generative Pre-trained Transformer (GPT):
    A family of transformer-based LLMs designed for generating text, with models like GPT-3 and GPT-4.

  10. Inference:
    The process of using a trained model to make predictions or generate outputs from new inputs.


Glossary Terms for Hugging Face:

  1. Pipeline:
    A high-level API in Hugging Face that simplifies access to pre-trained models for various tasks like sentiment analysis, translation, etc.

  2. Model Hub:
    An online repository by Hugging Face where users can find and share pre-trained models for different tasks.

  3. Transformers Library:
    Hugging Face’s open-source library providing tools for working with transformer models.

  4. Datasets:
    A library by Hugging Face for loading, preprocessing, and working with datasets for machine learning.

  5. Tokenizer:
    A Hugging Face module that converts text into tokens that models can process.

  6. Encoder-Decoder Architecture:
    A framework used in models like T5 and BART, where the encoder processes input and the decoder generates output.

  7. AutoModel:
    A class in Hugging Face that automatically selects the appropriate model architecture based on a pre-trained model name.

  8. Trainer:
    A utility in Hugging Face that simplifies training and fine-tuning of transformer models.

  9. Attention Mask:
    A mechanism used to specify which parts of an input sequence should be ignored (e.g., padding tokens).

  10. Hugging Face Hub:
    A platform for hosting and collaborating on machine learning models, datasets, and pipelines.


Drill Baby Drill #1

Model download and Save locally for future use as LLM (without internet) in 2 parts.

#pip torch tensor flash transformer

# pip list --> to check transformer is ther or not!

First download and save in local disk:


from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name= "facebook/blenderbot-400M-distill"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained("./tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model.save_pretrained("./model")

Second use from local disk and use  as local LLM

 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

import streamlit as st

tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")

def chatbot_response(user_input):
    inputs = tokenizer([user_input], return_tensors="pt")
    reply_ids = model.generate(**inputs)
    return tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0]

while True:
    user_input = input("Enter for response:")
    if user_input == "Exit":
       break
    print(chatbot_response(user_input))

Response:


"""Response:
Enter for response:How are you ?
 I'm doing well, thank you. How about yourself? Do you have any plans for the weekend?
"""

Drill baby Drill #2

Inference.py using HF KEY

#pip install huggingface_cli
#export HF_TOKEN="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"


from huggingface_hub import InferenceClient
import json

repo_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

llm_client = InferenceClient(
    model=repo_id,
    timeout=120,
)

def call_llm(inference_client: InferenceClient, prompt: str):
    response = inference_client.post(
        json={
            "inputs": prompt,
            "parameters": {"max_new_tokens": 200},
            "task": "text-generation",
        },
    )

    return json.loads(response.decode())[0]["generated_text"]

response=call_llm(llm_client, "write me a crazy joke")
print (response)

No comments:

Post a Comment

Unlock the Power of OLLAM and Hugging Face

https://www.youtube.com/watch?v=LQJVz-B_mZI&t=32s https://www.youtube.com/watch?v=jK_PZqeQ7BE Simple Steps anaconda.com download for you...