Hugging Face Hands-On 12 x 5 Minutes Quick Course Lecture Schedule
Hugging Face Terminology
There are some terms you'll need to know to get the most out of working with Hugging Face.
Pretrained model: A model that has been trained on a large dataset for a specific task before being made available for use.
Inference: Inference is the process of using a trained model to make predictions or draw conclusions about new, unseen data based on the learned patterns from the training data.
Transformers: Transformers are models that can handle text-based tasks, such as translation, summarization, and text generation. They use a special architecture that relies on attention mechanisms to capture the relationships between words and sentences.
Tokenizer: A tokenizer is a process that breaks down text into smaller units called tokens. Tokens are usually words or subwords that can be used for natural language processing (NLP) tasks.
Lecture 1: Introduction to Hugging Face
- Objective: Understand what Hugging Face is and its ecosystem.
- Tasks, models, datasets, spaces, pretrained & GGUF
- Content:
- Overview of Hugging Face tools (Transformers, Datasets, Accelerate).
- Importance in NLP and beyond.
- Installation: !pip install transformers
- Code Example:
from transformers import pipelineprint("Hugging Face is ready!")
- Real-Life Example: Automating customer support systems using pre-trained models for FAQs.
- Tasks, models, datasets, spaces, pretrained & GGUF
- Overview of Hugging Face tools (Transformers, Datasets, Accelerate).
- Importance in NLP and beyond.
- Installation: !pip install transformers
- Code Example:
from transformers import pipelineprint("Hugging Face is ready!")
- Real-Life Example: Automating customer support systems using pre-trained models for FAQs.
Lecture 2: What is a Pipeline?
- Objective: Learn the concept and usage of
pipeline
.
- Definition: A
pipeline
in Hugging Face is a simple interface for performing a variety of tasks (e.g., sentiment analysis, text generation) using pre-trained models.
- Content:
- Simplified API for NLP tasks.
- Tasks: Sentiment analysis, text generation, etc.
- Code Example:
from transformers import pipelinesentiment_analysis = pipeline("sentiment-analysis")print(sentiment_analysis("I love Hugging Face!"))# Output: [{'label': 'POSITIVE', 'score': 0.9999439716339111}]
- Real-Life Example: Analyzing social media posts for sentiment trends.
pipeline
.pipeline
in Hugging Face is a simple interface for performing a variety of tasks (e.g., sentiment analysis, text generation) using pre-trained models.- Simplified API for NLP tasks.
- Tasks: Sentiment analysis, text generation, etc.
- Code Example:
- from transformers import pipelinesentiment_analysis = pipeline("sentiment-analysis")print(sentiment_analysis("I love Hugging Face!"))# Output: [{'label': 'POSITIVE', 'score': 0.9999439716339111}]
- Real-Life Example: Analyzing social media posts for sentiment trends.
Lecture 3: Transformers Overview
- Objective: Understand the role of transformers in NLP.
- Definition: A Transformer is a neural network architecture designed to process sequential data, leveraging self-attention mechanisms to understand context and relationships within the data.
- Content:
- Self-attention mechanism.
- Pre-trained models like BERT, GPT.
- Code Example:
from transformers import AutoModel, AutoTokenizermodel_name = "bert-base-uncased"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModel.from_pretrained(model_name)inputs = tokenizer("Hello, world!", return_tensors="pt")outputs = model(**inputs)print(outputs.last_hidden_state.shape)
# Output: torch.Size([1, 1, 768])
Real-Life Example: Building a document classifier.
- Self-attention mechanism.
- Pre-trained models like BERT, GPT.
- Code Example:
from transformers import AutoModel, AutoTokenizermodel_name = "bert-base-uncased"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModel.from_pretrained(model_name)inputs = tokenizer("Hello, world!", return_tensors="pt")outputs = model(**inputs)print(outputs.last_hidden_state.shape)# Output: torch.Size([1, 1, 768])
Real-Life Example: Building a document classifier.
Lecture 4: Encoders
- Objective: Learn about encoder architectures and applications.
- Definition: Encoders are components of a Transformer model that process and convert input data into contextualized representations.
- Content:
- Encoding input data.
- Examples: BERT, RoBERTa.
- Code Example:
from transformers import BertTokenizer, BertModeltokenizer = BertTokenizer.from_pretrained("bert-base-uncased")model = BertModel.from_pretrained("bert-base-uncased")inputs = tokenizer("Encode this text.", return_tensors="pt")outputs = model(**inputs)print(outputs.last_hidden_state.shape)
# Output: torch.Size([1, 1, 768])
- Real-Life Example: Semantic search for information retrieval.
- Encoding input data.
- Examples: BERT, RoBERTa.
- Code Example:
from transformers import BertTokenizer, BertModeltokenizer = BertTokenizer.from_pretrained("bert-base-uncased")model = BertModel.from_pretrained("bert-base-uncased")inputs = tokenizer("Encode this text.", return_tensors="pt")outputs = model(**inputs)print(outputs.last_hidden_state.shape)# Output: torch.Size([1, 1, 768])
- Real-Life Example: Semantic search for information retrieval.
Lecture 5: Decoders
- Objective: Explore decoder architectures and use cases.
- Definition: Decoders are components of a Transformer model responsible for generating sequences based on encoded data, often used in tasks like text generation.
- Content:
- Generating output sequences.
- Examples: GPT, GPT-2.
- Code Example:
from transformers import GPT2Tokenizer, GPT2LMHeadModeltokenizer = GPT2Tokenizer.from_pretrained("gpt2")model = GPT2LMHeadModel.from_pretrained("gpt2")inputs = tokenizer("Complete this sentence:", return_tensors="pt")outputs = model.generate(inputs["input_ids"], max_length=20)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
#Please pass your input's `attention_mask` # to obtain reliable results.#output The United States has a long history# of supporting the Syrian government and;
Real-Life Example: Generating creative writing prompts.
- Generating output sequences.
- Examples: GPT, GPT-2.
- Code Example:
- from transformers import GPT2Tokenizer, GPT2LMHeadModeltokenizer = GPT2Tokenizer.from_pretrained("gpt2")model = GPT2LMHeadModel.from_pretrained("gpt2")inputs = tokenizer("Complete this sentence:", return_tensors="pt")outputs = model.generate(inputs["input_ids"], max_length=20)print(tokenizer.decode(outputs[0], skip_special_tokens=True))#Please pass your input's `attention_mask`# to obtain reliable results.#output The United States has a long history# of supporting the Syrian government and;
Real-Life Example: Generating creative writing prompts.
Lecture 6: Sequence to Sequence (Seq2Seq) Models
- Objective: Understand Seq2Seq and their tasks.
- Content:
- Translation, summarization, etc.
- Examples: T5, BART.
- Code Example:
from transformers import T5Tokenizer, T5ForConditionalGenerationtokenizer = T5Tokenizer.from_pretrained("t5-small")model = T5ForConditionalGeneration.from_pretrained("t5-small")inputs = tokenizer("translate English to French: Hugging Face", return_tensors="pt")outputs = model.generate(inputs["input_ids"], max_length=20)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
#output 'Face à l'écrasement'
- Real-Life Example: Translating user-generated content for multilingual websites.
- Translation, summarization, etc.
- Examples: T5, BART.
- Code Example:
from transformers import T5Tokenizer, T5ForConditionalGenerationtokenizer = T5Tokenizer.from_pretrained("t5-small")model = T5ForConditionalGeneration.from_pretrained("t5-small")inputs = tokenizer("translate English to French: Hugging Face", return_tensors="pt")outputs = model.generate(inputs["input_ids"], max_length=20)print(tokenizer.decode(outputs[0], skip_special_tokens=True))#output 'Face à l'écrasement'
- Real-Life Example: Translating user-generated content for multilingual websites.
Lecture 7: Sentiment Analysis (Pipeline Task)
- Objective: Apply sentiment analysis using Hugging Face.
- Content:
- Basics of pipeline tasks.
- Code Example:
from transformers import pipelinesentiment_analysis = pipeline("sentiment-analysis")result = sentiment_analysis(["I love coding.", "I hate bugs."])print(result)
#output [{'label': 'POSITIVE', 'score': 0.9996552467346191}, {'label': 'NEGATIVE', 'score': 0.9967179894447327}]
- Real-Life Example: Sentiment monitoring for customer feedback.
- Basics of pipeline tasks.
- Code Example:
from transformers import pipelinesentiment_analysis = pipeline("sentiment-analysis")result = sentiment_analysis(["I love coding.", "I hate bugs."])print(result)#output [{'label': 'POSITIVE', 'score': 0.9996552467346191}, {'label': 'NEGATIVE', 'score': 0.9967179894447327}]
- Real-Life Example: Sentiment monitoring for customer feedback.
Lecture 8: Text Generation with Transformers
- Objective: Generate meaningful text using GPT.
- Content:
- Understand the process of generating human-like text.
- Code Example:
- from transformers import pipelinegenerator = pipeline("text-generation", model="gpt2")print(generator("Hugging Face is", max_length=30, num_return_sequences=1))
#output [{'generated_text': 'Hugging Face is a leading tech company # based in New York City. It has been around for over 10# years and has a strong team of engineers and # researchers.'}]
- Real-Life Example: Automating email drafts for marketing.
- Understand the process of generating human-like text.
- Code Example:
- from transformers import pipelinegenerator = pipeline("text-generation", model="gpt2")print(generator("Hugging Face is", max_length=30,num_return_sequences=1))#output [{'generated_text': 'Hugging Face is a leading tech company# based in New York City. It has been around for over 10# years and has a strong team of engineers and# researchers.'}]
- Real-Life Example: Automating email drafts for marketing.
Lecture 9: Named Entity Recognition (NER)
- Objective: Learn how to identify entities in text.
- Content:
- NER with Hugging Face.
- Code Example:
# from transformers import pipeline# ner = pipeline("ner", grouped_entities=True)# print(ner("Hugging Face is based in New York."))
# ## # output [{'entity_group': 'ORG', 'score': 0.8876679, 'word': 'Hugging Face', 'start': 0, 'end': 12}, {'entity_group': 'LOC', 'score': 0.9985268, 'word': 'New York', 'start': 25, 'end': 33}]
- Real-Life Example: Extracting key information from resumes.
- NER with Hugging Face.
- Code Example:
# from transformers import pipeline# ner = pipeline("ner", grouped_entities=True)# print(ner("Hugging Face is based in New York."))# ## # output [{'entity_group': 'ORG', 'score': 0.8876679, 'word': 'Hugging Face', 'start': 0, 'end': 12}, {'entity_group': 'LOC', 'score': 0.9985268, 'word': 'New York', 'start': 25, 'end': 33}]
- Real-Life Example: Extracting key information from resumes.
Lecture 10: Fine-Tuning a Model
- Objective: Fine-tune a pre-trained model on custom data.
- Content:
- Dataset preparation and training.
- Code Example:
from transformers import Trainer, TrainingArguments
# Add fine-tuning steps using datasets here.
- Real-Life Example: Customizing sentiment analysis for a specific domain.
- Dataset preparation and training.
- Code Example:
from transformers import Trainer, TrainingArguments # Add fine-tuning steps using datasets here.
- Real-Life Example: Customizing sentiment analysis for a specific domain.
Lecture 11: Using Datasets
- Objective: Work with Hugging Face datasets.
- Content:
- Loading and preprocessing.
- Code Example:
from datasets import load_datasetdataset = load_dataset("imdb")print(dataset["train"][0])
#output """{'text': 'I rented I AM CURIOUS-YELLOW from my # video store because of all the controversy that surrounded # it when it was first released in 1967. I also heard that at# first it was seized by U.S. customs if it ever tried to # enter this country, therefore being a fan of films # considered "controversial" I really had to see this for# myself.<br /><br />The plot is centered around a young # Swedish drama student named Lena who wants to learn everything # she can about life. In particular she wants to focus her # attentions to making some sort of documentary ... # But really, this film doesn\'t have much of a plot.', # 'label': 0}"""
- Real-Life Example: Using movie reviews for training sentiment models.
- Loading and preprocessing.
- Code Example:
from datasets import load_datasetdataset = load_dataset("imdb")print(dataset["train"][0])#output """{'text': 'I rented I AM CURIOUS-YELLOW from my# video store because of all the controversy that surrounded# it when it was first released in 1967. I also heard that at# first it was seized by U.S. customs if it ever tried to# enter this country, therefore being a fan of films# considered "controversial" I really had to see this for# myself.<br /><br />The plot is centered around a young# Swedish drama student named Lena who wants to learn everything# she can about life. In particular she wants to focus her# attentions to making some sort of documentary ...# But really, this film doesn\'t have much of a plot.',# 'label': 0}"""
- Real-Life Example: Using movie reviews for training sentiment models.
Lecture 12: Deploying a Hugging Face Model
- Objective: Deploy and serve models in production.
- Content:
- Model saving and loading (to be completed.....) .
- Deployment options (Hugging Face Hub, APIs).
- Code Example:
from transformers import pipelinemodel = pipeline("text-generation", model="gpt2")model.save_pretrained("./gpt2_model")
#?! :-> Non-default generation parameters: # {'max_length': 50, 'do_sample': True}
- Real-Life Example: Hosting a text generation API for customer interaction.
- Model saving and loading (to be completed.....) .
- Deployment options (Hugging Face Hub, APIs).
- Code Example:
from transformers import pipelinemodel = pipeline("text-generation", model="gpt2")model.save_pretrained("./gpt2_model")#?! :-> Non-default generation parameters:# {'max_length': 50, 'do_sample': True}
- Real-Life Example: Hosting a text generation API for customer interaction.
https://gamma.app/docs/Hugging-Face-Hands-On-12-x-5-Minutes-Course-ja4euwzws6f1mzr
HF Demo
import streamlit as st
#pip install transformers
import torch
import tensorflow as tf
import flax
from transformers import pipeline
from transformers import BertTokenizer, BertModel
with st.sidebar:
st.title("Hugging Face Hands on")
st.image("logo.jpg")
choice = st.radio("Selct the HF Cocnept", ["Pipeline","Tokenization","Generation"])
if choice == "Pipeline":
task_name = st.selectbox("Choose a task", ["sentiment-analysis", "text-classification", "question-answering", "translation", "fill-mask"])
model_name = st.selectbox("Choose a model", ["distilbert-base-uncased", "bert-base-uncased", "roberta-base", "gpt2", "ctrl"])
model = pipeline(task_name, model_name)
input_text = st.text_area("Enter your text here")
result = model(input_text)[0]
st.success(f"Sentiment: {result['label']}, Score: {result['score']:.2f}")
if choice == "Tokenization":
# Tokenization with BERT
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
input_text = st.text_area("Enter text here to encode")
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)
st.success(outputs.last_hidden_state.shape)
# Output: torch.Size([1, 1, 768])
if choice == "Generation":
model_name = st.selectbox("Choose a model", ["gpt2", "ctrl"])
model = pipeline("text-generation", model=model_name)
input_text= "hugging Face is"
input_text = st.text_area("Enter your text here")
generator = pipeline("text-generation", model=model_name)
num_return_sequences = st.slider('Sequence No', min_value=10, max_value=100, value=5, step=1)
num_tokens_to_generate = st.slider('No of Tokens', min_value=10, max_value=100, value=5, step=1)
result = model(input_text, max_length=int(num_tokens_to_generate),
num_return_sequences=int(num_return_sequences))[0]
st.success(generator(input_text, max_length=30,
num_return_sequences=1))
Work In Progress 😆...
No comments:
Post a Comment