Hugging Face Hands On: Hugging Face 🤝On

Hugging Face Hands-On 12 x 5 Minutes Quick Course Lecture Schedule

Hugging Face Terminology

There are some terms you'll need to know to get the most out of working with Hugging Face.

Pretrained model: A model that has been trained on a large dataset for a specific task before being made available for use.

Inference: Inference is the process of using a trained model to make predictions or draw conclusions about new, unseen data based on the learned patterns from the training data.

Transformers: Transformers are models that can handle text-based tasks, such as translation, summarization, and text generation. They use a special architecture that relies on attention mechanisms to capture the relationships between words and sentences.

Tokenizer: A tokenizer is a process that breaks down text into smaller units called tokens. Tokens are usually words or subwords that can be used for natural language processing (NLP) tasks.

Lecture 1: Introduction to Hugging Face

Objective: Understand what Hugging Face is and its ecosystem.

Tasks, models, datasets, spaces, pretrained & GGUF

Content:

Overview of Hugging Face tools (Transformers, Datasets, Accelerate).
Importance in NLP and beyond.
Installation: !pip install transformers

Code Example:


from transformers import pipeline
print("Hugging Face is ready!")

Real-Life Example: Automating customer support systems using pre-trained models for FAQs.

Lecture 2: What is a Pipeline?

Objective: Learn the concept and usage of `pipeline`.

Definition: A `pipeline` in Hugging Face is a simple interface for performing a variety of tasks (e.g., sentiment analysis, text generation) using pre-trained models.

Content:

Simplified API for NLP tasks.

Tasks: Sentiment analysis, text generation, etc.

Code Example:

from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis")
print(sentiment_analysis("I love Hugging Face!"))
# Output: [{'label': 'POSITIVE', 'score': 0.9999439716339111}]

Real-Life Example: Analyzing social media posts for sentiment trends.

Lecture 3: Transformers Overview

Objective: Understand the role of transformers in NLP.
Definition: A Transformer is a neural network architecture designed to process sequential data, leveraging self-attention mechanisms to understand context and relationships within the data.
Content:
- Self-attention mechanism.
- Pre-trained models like BERT, GPT.
- Code Example:
  from transformers import AutoModel, AutoTokenizer
  model_name = "bert-base-uncased"
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModel.from_pretrained(model_name)
  inputs = tokenizer("Hello, world!", return_tensors="pt")
  outputs = model(**inputs)
  print(outputs.last_hidden_state.shape)
  
  # Output: torch.Size([1, 1, 768])
```
Real-Life Example: Building a document classifier.
```

Lecture 4: Encoders

Objective: Learn about encoder architectures and applications.
Definition: Encoders are components of a Transformer model that process and convert input data into contextualized representations.

Content:

Encoding input data.
Examples: BERT, RoBERTa.

Code Example:


from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Encode this text.", return_tensors="pt")
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

# Output: torch.Size([1, 1, 768])

Real-Life Example: Semantic search for information retrieval.

Lecture 5: Decoders

Objective: Explore decoder architectures and use cases.
Definition: Decoders are components of a Transformer model responsible for generating sequences based on encoded data, often used in tasks like text generation.
Content:
- Generating output sequences.
- Examples: GPT, GPT-2.
- Code Example:
- from transformers import GPT2Tokenizer, GPT2LMHeadModel
  tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
  model = GPT2LMHeadModel.from_pretrained("gpt2")
  inputs = tokenizer("Complete this sentence:", return_tensors="pt")
  outputs = model.generate(inputs["input_ids"], max_length=20)
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  
  #Please pass your input's `attention_mask`
  # to obtain reliable results.
  #output The United States has a long history
  # of supporting the Syrian government and;
```
Real-Life Example: Generating creative writing prompts.
```

Lecture 6: Sequence to Sequence (Seq2Seq) Models

Objective: Understand Seq2Seq and their tasks.

Content:

Translation, summarization, etc.
Examples: T5, BART.

Code Example:


from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
inputs = tokenizer("translate English to French: Hugging Face", return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_length=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

#output 'Face à l'écrasement'

Real-Life Example: Translating user-generated content for multilingual websites.

Lecture 7: Sentiment Analysis (Pipeline Task)

Objective: Apply sentiment analysis using Hugging Face.

Content:

Basics of pipeline tasks.

Code Example:

from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis")
result = sentiment_analysis(["I love coding.", "I hate bugs."])
print(result)

#output [{'label': 'POSITIVE', 'score': 0.9996552467346191}, {'label': 'NEGATIVE', 'score': 0.9967179894447327}]

Real-Life Example: Sentiment monitoring for customer feedback.

Lecture 8: Text Generation with Transformers

Objective: Generate meaningful text using GPT.

Content:

Understand the process of generating human-like text.

Code Example:
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
print(generator("Hugging Face is", max_length=30,
num_return_sequences=1))

#output [{'generated_text': 'Hugging Face is a leading tech company
# based in New York City. It has been around for over 10
# years and has a strong team of engineers and
# researchers.'}]

Real-Life Example: Automating email drafts for marketing.

Lecture 9: Named Entity Recognition (NER)

Objective: Learn how to identify entities in text.

Content:

NER with Hugging Face.

Code Example:


# from transformers import pipeline
# ner = pipeline("ner", grouped_entities=True)
# print(ner("Hugging Face is based in New York."))

# #
# # output [{'entity_group': 'ORG', 'score': 0.8876679, 'word': 'Hugging Face', 'start': 0, 'end': 12}, {'entity_group': 'LOC', 'score': 0.9985268, 'word': 'New York', 'start': 25, 'end': 33}]

Real-Life Example: Extracting key information from resumes.

Lecture 10: Fine-Tuning a Model

Objective: Fine-tune a pre-trained model on custom data.
Content:
- Dataset preparation and training.
- Code Example:
```
from transformers import Trainer, TrainingArguments
# Add fine-tuning steps using datasets here.
```
- Real-Life Example: Customizing sentiment analysis for a specific domain.

Lecture 11: Using Datasets

Objective: Work with Hugging Face datasets.

Content:

Loading and preprocessing.

Code Example:

from datasets import load_dataset
dataset = load_dataset("imdb")
print(dataset["train"][0])

#output """{'text': 'I rented I AM CURIOUS-YELLOW from my 
# video store because of all the controversy that surrounded 
# it when it was first released in 1967. I also heard that at
# first it was seized by U.S. customs if it ever tried to 
# enter this country, therefore being a fan of films 
# considered "controversial" I really had to see this for
# myself.<br /><br />The plot is centered around a young 
# Swedish drama student named Lena who wants to learn everything 
# she can about life. In particular she wants to focus her 
# attentions to making some sort of documentary  ... 
# But really, this film doesn\'t have much of a plot.', 
# 'label': 0}"""

Real-Life Example: Using movie reviews for training sentiment models.

Lecture 12: Deploying a Hugging Face Model

Objective: Deploy and serve models in production.

Content:

Model saving and loading (to be completed.....) .
Deployment options (Hugging Face Hub, APIs).

Code Example:


from transformers import pipeline
model = pipeline("text-generation", model="gpt2")
model.save_pretrained("./gpt2_model")

#?! :-> Non-default generation parameters: 
#       {'max_length': 50, 'do_sample': True}

Real-Life Example: Hosting a text generation API for customer interaction.

https://gamma.app/docs/Hugging-Face-Hands-On-12-x-5-Minutes-Course-ja4euwzws6f1mzr

https://hugfacehandson.blogspot.com/2025/01/unlock-power-of-ollam-and-hugging-face.html

HF Demo

import streamlit as st
#pip install transformers
import torch
import tensorflow as tf
import flax
from transformers import pipeline
from transformers import BertTokenizer, BertModel

with st.sidebar:
    st.title("Hugging Face Hands on") 
    st.image("logo.jpg")
    choice = st.radio("Selct the HF Cocnept", ["Pipeline","Tokenization","Generation"])

if choice == "Pipeline":
    task_name = st.selectbox("Choose a task", ["sentiment-analysis", "text-classification", "question-answering", "translation", "fill-mask"])    
    model_name = st.selectbox("Choose a model", ["distilbert-base-uncased", "bert-base-uncased", "roberta-base", "gpt2", "ctrl"])   
    model = pipeline(task_name, model_name)    
    input_text = st.text_area("Enter your text here")
    result = model(input_text)[0]
    st.success(f"Sentiment: {result['label']}, Score: {result['score']:.2f}")

if choice == "Tokenization":
  
    # Tokenization with BERT
    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
    model = BertModel.from_pretrained("bert-base-uncased")
  
    input_text = st.text_area("Enter text here to encode")
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model(**inputs)
    st.success(outputs.last_hidden_state.shape)

    # Output: torch.Size([1, 1, 768])

if choice == "Generation":
    model_name = st.selectbox("Choose a model", ["gpt2", "ctrl"])
    model = pipeline("text-generation", model=model_name)
    input_text= "hugging Face is"
    input_text = st.text_area("Enter your text here")
    generator = pipeline("text-generation", model=model_name)
    num_return_sequences = st.slider('Sequence No', min_value=10, max_value=100, value=5, step=1)
    num_tokens_to_generate = st.slider('No of Tokens', min_value=10, max_value=100, value=5, step=1)
    result = model(input_text, max_length=int(num_tokens_to_generate), 
                   num_return_sequences=int(num_return_sequences))[0]
    st.success(generator(input_text, max_length=30, 
                num_return_sequences=1))

    

Work In Progress 😆...

Hugging Face Hands On

Wednesday, 22 January 2025

Hugging Face 🤝On

Hugging Face Hands-On 12 x 5 Minutes Quick Course Lecture Schedule

Lecture 1: Introduction to Hugging Face

Lecture 2: What is a Pipeline?

Lecture 3: Transformers Overview

Lecture 4: Encoders

Lecture 5: Decoders

Lecture 6: Sequence to Sequence (Seq2Seq) Models

Lecture 7: Sentiment Analysis (Pipeline Task)

Lecture 8: Text Generation with Transformers

Lecture 9: Named Entity Recognition (NER)

Lecture 10: Fine-Tuning a Model

Objective: Fine-tune a pre-trained model on custom data.

Content:

Dataset preparation and training.

Code Example:
`from transformers import Trainer, TrainingArguments # Add fine-tuning steps using datasets here.`

Real-Life Example: Customizing sentiment analysis for a specific domain.

Lecture 11: Using Datasets

Lecture 12: Deploying a Hugging Face Model

No comments:

Post a Comment

Unlock the Power of OLLAM and Hugging Face

Report Abuse

Wednesday, 22 January 2025

Hugging Face 🤝On

Hugging Face Hands-On 12 x 5 Minutes Quick Course Lecture Schedule

Lecture 1: Introduction to Hugging Face

Lecture 2: What is a Pipeline?

Lecture 3: Transformers Overview

Lecture 4: Encoders

Lecture 5: Decoders

Lecture 6: Sequence to Sequence (Seq2Seq) Models

Lecture 7: Sentiment Analysis (Pipeline Task)

Lecture 8: Text Generation with Transformers

Lecture 9: Named Entity Recognition (NER)

Lecture 10: Fine-Tuning a Model

Objective: Fine-tune a pre-trained model on custom data. Content: Dataset preparation and training. Code Example: from transformers import Trainer, TrainingArguments # Add fine-tuning steps using datasets here. Real-Life Example: Customizing sentiment analysis for a specific domain.

Lecture 11: Using Datasets

Lecture 12: Deploying a Hugging Face Model

No comments:

Post a Comment

Unlock the Power of OLLAM and Hugging Face

Objective: Fine-tune a pre-trained model on custom data.

Content:

Dataset preparation and training.

Code Example:
`from transformers import Trainer, TrainingArguments # Add fine-tuning steps using datasets here.`

Real-Life Example: Customizing sentiment analysis for a specific domain.