https://www.youtube.com/watch?v=LQJVz-B_mZI&t=32s
https://www.youtube.com/watch?v=jK_PZqeQ7BE
Simple Steps
- anaconda.com download for your os
- conda create -n hf_1 python 3.11
- conda activate hf_1
- pip install -U "huggingface_hub[cli]"
- pip install huggingface_hub[hf_transfer]
- add variable HF_HUB_ENABLE_HF_TRANSFER=1 to env : windows-properties-advanced settings-environment variables - new - Variable ...HF ok
- Huggingface - Models- GGUF - Search for TheBlock - copy model name with GUF
- huggingface-cli download TheBloke/MistralLite-7B-GGUF mistrallite.Q4_K_M.gguf --local-dir downloads --local-dir-use-symlinks False
- Notepad - create Modelfile .> FROM downloads\mistrallite.Q4_K_M.gguf
- ollama create aurmc -f Modelfile
- ollama list
- ollama run aurmc (model requires more system memory (6.0 GiB) than is available (4.4 GiB) otherwise Error:)
Here are 10 important glossary terms for both LLMs (Large Language Models) and Hugging Face:
Glossary Terms for LLM (Large Language Models):
-
Transformer:
A neural network architecture designed to process sequential data using self-attention mechanisms, powering most modern LLMs. -
Self-Attention:
A mechanism that allows models to focus on relevant parts of the input sequence while processing it, crucial for understanding context. -
Pre-training:
The process of training an LLM on a large dataset to learn general language representations before fine-tuning on specific tasks. -
Fine-tuning:
Adapting a pre-trained model to perform specific tasks by training it further on a smaller, task-specific dataset. -
Tokenization:
The process of breaking text into smaller units (tokens) like words, subwords, or characters, which are fed into the model. -
Context Window:
The maximum sequence length or number of tokens that a model can process at once. -
Zero-shot Learning:
The ability of an LLM to perform tasks it hasn't explicitly been trained on by leveraging its general understanding of language. -
Few-shot Learning:
Using a few examples to prompt an LLM to perform a task without extensive retraining or fine-tuning. -
Generative Pre-trained Transformer (GPT):
A family of transformer-based LLMs designed for generating text, with models like GPT-3 and GPT-4. -
Inference:
The process of using a trained model to make predictions or generate outputs from new inputs.
Glossary Terms for Hugging Face:
-
Pipeline:
A high-level API in Hugging Face that simplifies access to pre-trained models for various tasks like sentiment analysis, translation, etc. -
Model Hub:
An online repository by Hugging Face where users can find and share pre-trained models for different tasks. -
Transformers Library:
Hugging Face’s open-source library providing tools for working with transformer models. -
Datasets:
A library by Hugging Face for loading, preprocessing, and working with datasets for machine learning. -
Tokenizer:
A Hugging Face module that converts text into tokens that models can process. -
Encoder-Decoder Architecture:
A framework used in models like T5 and BART, where the encoder processes input and the decoder generates output. -
AutoModel:
A class in Hugging Face that automatically selects the appropriate model architecture based on a pre-trained model name. -
Trainer:
A utility in Hugging Face that simplifies training and fine-tuning of transformer models. -
Attention Mask:
A mechanism used to specify which parts of an input sequence should be ignored (e.g., padding tokens). -
Hugging Face Hub:
A platform for hosting and collaborating on machine learning models, datasets, and pipelines.
Drill Baby Drill #1
Model download and Save locally for future use as LLM (without internet) in 2 parts.
#pip torch tensor flash transformer
# pip list --> to check transformer is ther or not!
First download and save in local disk:
Second use from local disk and use as local LLM
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
Response:
Drill baby Drill #2
Inference.py using HF KEY
No comments:
Post a Comment