Wednesday, 22 January 2025

Hugging Face 🤝On

Hugging Face Hands-On 12 x 5 Minutes Quick Course Lecture Schedule


Hugging Face Terminology
There are some terms you'll need to know to get the most out of working with Hugging Face.

Pretrained model: A model that has been trained on a large dataset for a specific task before being made available for use.

Inference: Inference is the process of using a trained model to make predictions or draw conclusions about new, unseen data based on the learned patterns from the training data.

Transformers: Transformers are models that can handle text-based tasks, such as translation, summarization, and text generation. They use a special architecture that relies on attention mechanisms to capture the relationships between words and sentences.

Tokenizer: A tokenizer is a process that breaks down text into smaller units called tokens. Tokens are usually words or subwords that can be used for natural language processing (NLP) tasks.


Lecture 1: Introduction to Hugging Face

  • Objective: Understand what Hugging Face is and its ecosystem.
    • Tasks, models, datasets, spaces, pretrained & GGUF 
  • Content:
    • Overview of Hugging Face tools (Transformers, Datasets, Accelerate).
    • Importance in NLP and beyond.
    • Installation: !pip install transformers
    • Code Example:

      from transformers import pipeline
      print("Hugging Face is ready!")

    • Real-Life Example: Automating customer support systems using pre-trained models for FAQs.

Lecture 2: What is a Pipeline?

  • Objective: Learn the concept and usage of pipeline.
  • Definition: A pipeline in Hugging Face is a simple interface for performing a variety of tasks (e.g., sentiment analysis, text generation) using pre-trained models.
  • Content:
    • Simplified API for NLP tasks.
    • Tasks: Sentiment analysis, text generation, etc.
    • Code Example:

      • from transformers import pipeline
        sentiment_analysis = pipeline("sentiment-analysis")
        print(sentiment_analysis("I love Hugging Face!"))
        # Output: [{'label': 'POSITIVE', 'score': 0.9999439716339111}]

    • Real-Life Example: Analyzing social media posts for sentiment trends.

Lecture 3: Transformers Overview

  • Objective: Understand the role of transformers in NLP.
  • Definition: A Transformer is a neural network architecture designed to process sequential data, leveraging self-attention mechanisms to understand context and relationships within the data.
  • Content:
    • Self-attention mechanism.
    • Pre-trained models like BERT, GPT.
    • Code Example:
      from transformers import AutoModel, AutoTokenizer
      model_name = "bert-base-uncased"
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      model = AutoModel.from_pretrained(model_name)
      inputs = tokenizer("Hello, world!", return_tensors="pt")
      outputs = model(**inputs)
      print(outputs.last_hidden_state.shape)

      # Output: torch.Size([1, 1, 768])

      Real-Life Example: Building a document classifier.

Lecture 4: Encoders

  • Objective: Learn about encoder architectures and applications.
  • Definition: Encoders are components of a Transformer model that process and convert input data into contextualized representations.
  • Content:
    • Encoding input data.
    • Examples: BERT, RoBERTa.
    • Code Example:

      from transformers import BertTokenizer, BertModel
      tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
      model = BertModel.from_pretrained("bert-base-uncased")
      inputs = tokenizer("Encode this text.", return_tensors="pt")
      outputs = model(**inputs)
      print(outputs.last_hidden_state.shape)

      # Output: torch.Size([1, 1, 768])


    • Real-Life Example: Semantic search for information retrieval.

Lecture 5: Decoders

  • Objective: Explore decoder architectures and use cases.
  • Definition: Decoders are components of a Transformer model responsible for generating sequences based on encoded data, often used in tasks like text generation.
  • Content:
    • Generating output sequences.
    • Examples: GPT, GPT-2.
    • Code Example:

    • from transformers import GPT2Tokenizer, GPT2LMHeadModel
      tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
      model = GPT2LMHeadModel.from_pretrained("gpt2")
      inputs = tokenizer("Complete this sentence:", return_tensors="pt")
      outputs = model.generate(inputs["input_ids"], max_length=20)
      print(tokenizer.decode(outputs[0], skip_special_tokens=True))

      #Please pass your input's `attention_mask`
      # to obtain reliable results.
      #output The United States has a long history
      # of supporting the Syrian government and;

      Real-Life Example: Generating creative writing prompts.

Lecture 6: Sequence to Sequence (Seq2Seq) Models

  • Objective: Understand Seq2Seq and their tasks.
  • Content:
    • Translation, summarization, etc.
    • Examples: T5, BART.
    • Code Example:

      from transformers import T5Tokenizer, T5ForConditionalGeneration
      tokenizer = T5Tokenizer.from_pretrained("t5-small")
      model = T5ForConditionalGeneration.from_pretrained("t5-small")
      inputs = tokenizer("translate English to French: Hugging Face", return_tensors="pt")
      outputs = model.generate(inputs["input_ids"], max_length=20)
      print(tokenizer.decode(outputs[0], skip_special_tokens=True))

      #output 'Face à l'écrasement'
    • Real-Life Example: Translating user-generated content for multilingual websites.

Lecture 7: Sentiment Analysis (Pipeline Task)

  • Objective: Apply sentiment analysis using Hugging Face.
  • Content:
    • Basics of pipeline tasks.
    • Code Example:
      from transformers import pipeline
      sentiment_analysis = pipeline("sentiment-analysis")
      result = sentiment_analysis(["I love coding.", "I hate bugs."])
      print(result)

      #output [{'label': 'POSITIVE', 'score': 0.9996552467346191}, {'label': 'NEGATIVE', 'score': 0.9967179894447327}]

    • Real-Life Example: Sentiment monitoring for customer feedback.

Lecture 8: Text Generation with Transformers

  • Objective: Generate meaningful text using GPT.
  • Content:
    • Understand the process of generating human-like text.
    • Code Example:
    • from transformers import pipeline
      generator = pipeline("text-generation", model="gpt2")
      print(generator("Hugging Face is", max_length=30,
                      num_return_sequences=1))


      #output [{'generated_text': 'Hugging Face is a leading tech company
      #          based in New York City. It has been around for over 10
      #          years and has a strong team of engineers and
      #          researchers.'}]

    • Real-Life Example: Automating email drafts for marketing.

Lecture 9: Named Entity Recognition (NER)

  • Objective: Learn how to identify entities in text.
  • Content:
    • NER with Hugging Face.
    • Code Example:

      # from transformers import pipeline
      # ner = pipeline("ner", grouped_entities=True)
      # print(ner("Hugging Face is based in New York."))

      # #
      # # output [{'entity_group': 'ORG', 'score': 0.8876679, 'word': 'Hugging Face', 'start': 0, 'end': 12}, {'entity_group': 'LOC', 'score': 0.9985268, 'word': 'New York', 'start': 25, 'end': 33}]


    • Real-Life Example: Extracting key information from resumes.

Lecture 10: Fine-Tuning a Model

  • Objective: Fine-tune a pre-trained model on custom data.
  • Content:
    • Dataset preparation and training.
    • Code Example:
      from transformers import Trainer, TrainingArguments
      # Add fine-tuning steps using datasets here.
      
    • Real-Life Example: Customizing sentiment analysis for a specific domain.

Lecture 11: Using Datasets

  • Objective: Work with Hugging Face datasets.
  • Content:
    • Loading and preprocessing.
    • Code Example:
      from datasets import load_dataset
      dataset = load_dataset("imdb")
      print(dataset["train"][0])

      #output """{'text': 'I rented I AM CURIOUS-YELLOW from my
      # video store because of all the controversy that surrounded
      # it when it was first released in 1967. I also heard that at
      # first it was seized by U.S. customs if it ever tried to
      # enter this country, therefore being a fan of films
      # considered "controversial" I really had to see this for
      # myself.<br /><br />The plot is centered around a young
      # Swedish drama student named Lena who wants to learn everything
      # she can about life. In particular she wants to focus her
      # attentions to making some sort of documentary  ...
      # But really, this film doesn\'t have much of a plot.',
      # 'label': 0}"""

    • Real-Life Example: Using movie reviews for training sentiment models.

Lecture 12: Deploying a Hugging Face Model

  • Objective: Deploy and serve models in production.
  • Content:
    • Model saving and loading (to be completed.....) .
    • Deployment options (Hugging Face Hub, APIs).
    • Code Example:

      from transformers import pipeline
      model = pipeline("text-generation", model="gpt2")
      model.save_pretrained("./gpt2_model")

      #?! :-> Non-default generation parameters:
      #       {'max_length': 50, 'do_sample': True}

    • Real-Life Example: Hosting a text generation API for customer interaction.

https://gamma.app/docs/Hugging-Face-Hands-On-12-x-5-Minutes-Course-ja4euwzws6f1mzr


HF Demo

import streamlit as st
#pip install transformers
import torch
import tensorflow as tf
import flax
from transformers import pipeline
from transformers import BertTokenizer, BertModel

with st.sidebar:
    st.title("Hugging Face Hands on")
    st.image("logo.jpg")
    choice = st.radio("Selct the HF Cocnept", ["Pipeline","Tokenization","Generation"])

if choice == "Pipeline":
    task_name = st.selectbox("Choose a task", ["sentiment-analysis", "text-classification", "question-answering", "translation", "fill-mask"])    
    model_name = st.selectbox("Choose a model", ["distilbert-base-uncased", "bert-base-uncased", "roberta-base", "gpt2", "ctrl"])  
    model = pipeline(task_name, model_name)    
    input_text = st.text_area("Enter your text here")
    result = model(input_text)[0]
    st.success(f"Sentiment: {result['label']}, Score: {result['score']:.2f}")

if choice == "Tokenization":
 
    # Tokenization with BERT
    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
    model = BertModel.from_pretrained("bert-base-uncased")
 
    input_text = st.text_area("Enter text here to encode")
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model(**inputs)
    st.success(outputs.last_hidden_state.shape)

    # Output: torch.Size([1, 1, 768])

if choice == "Generation":
    model_name = st.selectbox("Choose a model", ["gpt2", "ctrl"])
    model = pipeline("text-generation", model=model_name)
    input_text= "hugging Face is"
    input_text = st.text_area("Enter your text here")
    generator = pipeline("text-generation", model=model_name)
    num_return_sequences = st.slider('Sequence No', min_value=10, max_value=100, value=5, step=1)
    num_tokens_to_generate = st.slider('No of Tokens', min_value=10, max_value=100, value=5, step=1)
    result = model(input_text, max_length=int(num_tokens_to_generate),
                   num_return_sequences=int(num_return_sequences))[0]
    st.success(generator(input_text, max_length=30,
                num_return_sequences=1))

   


Work In Progress 😆...

No comments:

Post a Comment

Unlock the Power of OLLAM and Hugging Face

https://www.youtube.com/watch?v=LQJVz-B_mZI&t=32s https://www.youtube.com/watch?v=jK_PZqeQ7BE Simple Steps anaconda.com download for you...