gft (general fine-tuning): A Little Language for Deepnets

1-line programs for fine-tuning and inference

Installation

Clone this repo and then say:

pip install -r requirements/requirements.txt

Make sure gft_predict is in your PATH. Then set up the following environment variables with:

g=`which gft_predict`
export gft=`dirname $g`

Unfortunately, there are a number of incompatibilities between adapters, paddlespeech and the latest version of HuggingFace transformers. There are a number of versions of requirements.txt the requirements directory. We recommend setting up several different virtual environments to work around some of these incompatibilities.

The scripts in the examples directory will create results under $gft_checkpoints. Please set that variable to some place where you have plenty of free disk space. The results are large because most fine-tuning examples copy a pre-trained model. Given there are many dozens of such examples, there will be many dozens of copies of large models.

WARNING: Some of the fine-tuning scripts take a long time, and not all examples are working (yet).

Introduction: Pre-Training, Fine-Tuning and Inference

The table below shows a 3-step recipe, which has become standard in the literature on deep nets.

Step	gft Support	Description	Time	Hardware
1		Pre-Training	Days/Weeks	Large GPU Cluster
2	gft_fit	Fine-Tuning	Hours/Days	1+ GPUs
3	gft_predict	Inference	Seconds/Minutes	0+ GPUs

This repo provides support for step 2 (gft_fit) and step 3 (gft_predict). Most gft_fit and gft_predict programs are short (1-line), much shorter than examples such as these, which are typically a few hundred lines of python. With gft, users should not need to read or modify any python code for steps 2 and 3 in the table above.

Step 1, pre-training, is beyond the scope of this work. We recommend starting with models from HuggingFace and PaddleHub/PaddleNLP hubs, as illustrated in the examples below.

Fine-Tuning Equations

Most gft programs are short (1-liners). While gft supports most arguments in most HuggingFace and PaddleNLP examples, most gft programs specify 4 arguments:

gft_fit --model H:bert-base-cased \
    --data H:glue,qqp \
    --metric H:glue,qqp \
    --output_dir $outdir \
    --eqn 'classify: label ~ question1 + question2'

This gft program fine-tunes a pretrained model (bert-base-cased from HuggingFace) with a dataset (the qqp subset of glue from HuggingFace).

One of the design goals of gft is to make fine-tuning as accessible to a broad audience as possible. It should be as easy to fine-tune a deep net as it is to fit a regression model.

gft equations are similar to glm (general linear models) equations in regression packages such as in R.

Another design goal is to make it easy to port from one supplier to another. The gft program below is similar to the one above, except this one uses data from PaddleNLP instead of HuggingFace. With gft, it should be possible to mix and match datasets and models from different suppliers.

The variables in the equations refer to columns in the datasets. The equations are slightly different in gft programs above and below because different suppliers use different names for columns.

gft_fit --model H:bert-base-cased \
    --data P:glue,qqp \
    --metric H:glue,qqp \
    --output_dir $outdir \
    --eqn 'classify: labels ~ sentence1 + sentence2'

Datasets and Models: Embarrassement of Riches

As mentioned above, there are about 30k models and 3k datasets in HuggingFace, and there are more in PaddleHub/PaddleNLP and AdapterHub. Going forward, there will be even more. These numbers are about 3x larger than they were a year ago. How can we find the good stuff?

Some datasets are more popular than others (top 1000):

Top 50 HuggingFace datasets by two metrics (both if member of both sets):

# of models: glue, super_glue, common_voice, librispeech_asr, wikitext, squad, imdb, blimp, paws, wmt16, tweet_eval, trec, red_caps, xnli, ag_news, tweets_hate_speech_detection, squad_v2, stsb_multi_mt, cfq, wikiann, anli, conll2003, snli, schema_guided_dstc8, hellaswag, xsum, code_search_net, winogrande, cnn_dailymail, tab_fact, piqa, adversarial_qa, banking77, race, amazon_polarity, emotion, americas_nli, hans, daily_dialog, wino_bias, oscar, amazon_reviews_multi, klue, mc4, universal_dependencies, rotten_tomatoes, samsum, cosmos_qa, c4, xtreme
# of downloads: wikipedia, common_voice, bookcorpus, c4, glue, squad, conll2003, oscar, librispeech_asr, tweet_eval, squad_v2, emotion, masakhaner, cnn_dailymail, amazon_reviews_multi, imdb, wmt16, mc4, xsum, superb, universal_dependencies, multi_nli, xtreme, vctk, covost2, wikiann, samsum, openslr, timit_asr, xnli, snli, multilingual_librispeech, wmt14, klue, gigaword, clinc_oos, cc100, wmt19, natural_questions, nli_tr, mlsum, code_search_net, bigscience/P3, kde4, xquad, wikisql, race, go_emotions, conll2002, wnut_17

HuggingFace provides tools to make it easier to work with large numbers of models and datasets. There are some short python programs in $gft/huggingface_hub that use these tools to list models and datasets (and useful cominbations of the two). There are also some text files that were created by these python programs:

N	python program	output
29915	$gft/huggingface_hub/list_huggingface_models.py	$gft/huggingface_hub/huggingface_models.txt
400	$gft/huggingface_hub/adapters.txt
3098	$gft/huggingface_hub/list_huggingface_datasets.py	$gft/huggingface_hub/huggingface_datasets.txt

There is also a python program, models_for_dataset.py, that outputs a list of models for a particular dataset. On average, there are about 10x models for each dataset, though obviously, some datasets are more popular than others. Here are a couple of models for a few of the more popular datasets.

python $gft/huggingface_hub/models_for_dataset.py common_voice | sed 3q
# dataset: common_voice --> 524 models
common_voice	facebook/wav2vec2-large-xlsr-53
common_voice	facebook/wav2vec2-xls-r-300m

python $gft/huggingface_hub/models_for_dataset.py glue | sed 3q
# dataset: glue --> 188 models
glue	Alireza1044/albert-base-v2-sst2
glue	DeepPavlov/xlm-roberta-large-en-ru-mnli

python $gft/huggingface_hub/models_for_dataset.py emotion | sed 3q
# dataset: emotion --> 59 models
emotion	bhadresh-savani/distilbert-base-uncased-emotion
emotion	nateraw/bert-base-uncased-emotion

Here are some possibly useful links for finding datasets and models.

	HuggingFace	PaddleHub/PaddleNLP	AdapterHub
Datasets	text file explorer	datasets
Models	large text file explorer	models	small text file explorer

gft_dataset: Output Dataset on stdout

The program, gft_dataset, makes it easy to look at datasets from different suppliers. The following example downloads the qqp subset of glue from HuggingFace and PaddleNLP. The "2>/dev/null" removes messages sent to stderr. Piping the results to "sed 1q" terminates output after the first line (for expository convenience). Note that the HuggingFace version (H:) names the fields: question1, question2, label, and the PaddleNLP version (P:) names the fields: sentence1, sentence2, labels.

# Go to Huggingface, and output the val split of the qqp task in glue to stdout
gft_dataset --data 'H:glue,qqp' --split val 2>/dev/null | sed 1q
question1|Why are African-Americans so beautiful?	question2|Why are hispanics so beautiful?	label|0	idx|0

# Same as above, but replace HuggingFace with PaddleNLP
gft_dataset --data 'P:glue,qqp' --split val 2>/dev/null | sed 1q
sentence1|Why are African-Americans so beautiful?	sentence2|Why are hispanics so beautiful?	labels|0

If the optional eqn argument is provided, then gft_dataset uses the equation to extract the appropriate fields. The first column of the output corresponds to the rhs (right hand side) of the equation, and the second column corresponds to the lhs (left hand side) of the equation.

# Same as above, but use the equation to select fields of interest
gft_dataset --eqn 'classify: label ~ question1 + question2' --data 'H:glue,qqp' --split val 2>/dev/null | sed 1q
Why are African-Americans so beautiful?|Why are hispanics so beautiful?	0

The following example illustrates a custom dataset (C:), where the data are in csv files on the load filesystem. Normally, the lhs of a regression is a single real value, but in this case, it is a vector in R^3 .

# without eqn arg
gft_dataset --data "C:$gft/datasets/VAD/VAD" --split val 2>/dev/null | sed 3q
# Word|abandonment	Valence|0.128	Arousal|0.43	Dominance|0.202
# Word|abbey	Valence|0.58	Arousal|0.367	Dominance|0.444
# Word|abbreviation	Valence|0.469	Arousal|0.306	Dominance|0.345

# with eqn arg
gft_dataset --eqn 'regress: Valence + Arousal + Dominance ~ Word' --data "C:$gft/datasets/VAD/VAD" --split val 2>/dev/null | sed 3q
# abandonment	0.128|0.43|0.202
# abbey	0.58|0.367|0.444
# abbreviation	0.469|0.306|0.345

The following example shows that gft_dataset can also be applied to speech datasets. Common voice is available in English (en), Chinese (zh-CN), as well as a number of other choices. The raw data includes the waveform as an array, but with the eqn argument, we can extract a few useful fields such as the filename and the transcription.

# without eqn arg
gft_dataset --data H:common_voice,en 2>/dev/null | sed 1q | tr '\t' '\n' 
client_id|a07b17f8234ded5e847443ea6f423cef745cbbc7537fb637d58326000aa751e829a21c4fd0a35fc17fb833aa7e95ebafce5e...
# path|common_voice_en_100363.mp3
# audio|{'path': 'cv-corpus-6.1-2020-12-11/en/clips/common_voice_en_100363.mp3', 'array': array([0.0000000e+...
# sentence|It was the time of day when all of Spain slept during the summer.
# up_votes|2
# down_votes|1
# age|
# gender|
# accent|
# locale|en
# segment|''

# with eqn arg (English)
gft_dataset --eqn 'ctc:sentence ~ path'  --data H:common_voice,en 2>/dev/null | sed 3q
# common_voice_en_100363.mp3	It was the time of day when all of Spain slept during the summer.
# common_voice_en_100540.mp3	Same way you did.
# common_voice_en_100546.mp3	Sarah told him that she was there to see her brother.

# with eqn arg (Chinese)
gft_dataset --eqn 'ctc:sentence ~ path'  --data H:common_voice,zh-CN 2>/dev/null | sed 3q
# common_voice_zh-CN_18524189.mp3	正巧母亲往外探头
# common_voice_zh-CN_18532640.mp3	至今为止，元气火箭总共发行了两张专辑。
# common_voice_zh-CN_18532644.mp3	失业率降到十七年来的新低点

Labels

gft_labels.py outputs the set of labels for datasets and/or models.

./gft_labels.py --data H:emotion 2>/dev/null
H:emotion	sadness	joy	love	anger	fear	surprise

./gft_labels.py --model H:AdapterHub/bert-base-uncased-pf-emotion 2>/dev/null
H:AdapterHub/bert-base-uncased-pf-emotion	sadness	joy	love	anger	fear	surprise

./gft_labels.py --task image-classification --model H:nateraw/vit-base-cats-vs-dogs 2>/dev/null
H:nateraw/vit-base-cats-vs-dogs	cat	dog

# The default model for text-classification has 2 classes
gft_labels.py --model H:distilbert-base-uncased-finetuned-sst-2-english
# H:distilbert-base-uncased-finetuned-sst-2-english	NEGATIVE	POSITIVE

# The default model for image-classification has 1k classes
gft_labels.py --model H:google/vit-base-patch16-224 --task image-classification | 
tr '\t' '\n' | wc -l
#   1000

gft_labels.py --model H:google/vit-base-patch16-224 --task image-classification | tr '\t' '\n' | head
# Afghan hound, Afghan
# African chameleon, Chamaeleo chamaeleon
# African crocodile, Nile crocodile, Crocodylus niloticus
# African elephant, Loxodonta africana
# African grey, African gray, Psittacus erithacus
# African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
# Airedale, Airedale terrier
# American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier
# American alligator, Alligator mississipiensis

Note, gft_labels.py uses a set of heuristics. Sometimes, these heuristics fail to find the names of the labels.

See $gft/huggingface_hub/huggingface_models_with_labels.txt for labels for about 1000 text-classifier models.

gft Equations

The examples above (and below) make use of equations such as:

--eqn 'classify: label ~ question1 + question2'

The keyword, classify, will be distinguished from other keywords such as regress, classify_tokens, classify_spans, ctc, etc. In the classification case, for each input (pairs of two questions), there is a single label (semantically similar or not). Classify also generalizes from binary classification to multiclass classification (for tasks such as emotion classification). As shown in the table below, equations start with a number of different keywords:

classify: lhs denotes a set of classes
regress : lhs denotes a point in a vector space
classify_tokens : there is a classification task for each token on the rhs
classify_spans : used for SQuAD-like tasks where the output should be a span (substring) of the rhs
ctc: used in speech recognition where the input is audio and the output is text

There are a number of examples of equations in the table below.

Dataset	Subset	Data Argument	Equation	Pipeline Task
GLUE	COLA	H:glue,cola	classify : label ~ sentence	text-classification
GLUE	SST2	H:glue,cola	classify : label ~ sentence	text-classification
GLUE	WNLI	H:glue,cola	classify : label ~ sentence	text-classification
GLUE	MRPC	H:glue,cola	classify : label ~ sentence1 + sentence2	text-classification
GLUE	QNLI	H:glue,cola	classify : label ~ sentence1 + sentence2	text-classification
GLUE	QQP	H:glue,cola	classify : label ~ question + sentence	text-classification
GLUE	SSTB	H:glue,cola	regress : label ~ question1 + question2
GLUE	MNLI	H:glue,cola	classify : label ~ premise + hypothesis	text-classification
SQuAD 1.0		H:squad	classify_spans : answers ~ question + context	question-answering
SQuAD 2.0		H:squad_v2	classify_spans : answers ~ question + context	question-answering
CONLL2003	POS	H:conll2003	classify_tokens : pos_tags ~ tokens	token-classification
CONLL2003	NER	H:conll2003	classify_tokens : ner_tags ~ tokens	token-classification
CONLL2003	Chunking	H:conll2003	classify_tokens : chunk_tags ~ tokens	token-classification
TIMIT		H:timit_asr	ctc: text ~ audio	automatic-speech-recognition
LibriSpeech		H:librispeech_asr	ctc: text ~ audio	automatic-speech-recognition
Amazon Reviews		H:amazon_reviews_multi	classify: stars ~ review_title + review_body	text-classification
VAD		C:$gft/datasets/VAD/VAD	regress: Valence + Arousal + Dominance ~ Word

Inference: gft_predict

gft_predict reads input from stdin, and writes output predicts to stdout. If one wants to run a model on data from a standard dataset, then pipe gft_dataset into gft_predict.

# extract examples from dataset (and stop after 2nd one, for expository purposes)
gft_dataset --data H:emotion --eqn 'classify: label ~ text' 2>/dev/null | sed 2q
# im feeling rather rotten so im not very ambitious right now	0
# im updating my blog because i feel shitty	0

# same as above, but pipe results into gft_predict
# gft_predict appends predicted labels and scores to each input line
gft_dataset --data H:emotion --eqn 'classify: label ~ text' 2>/dev/null | sed 2q | 
gft_predict --task H:text-classification 2>/dev/null
# im feeling rather rotten so im not very ambitious right now	0	NEGATIVE	0.9998108744621277
# im updating my blog because i feel shitty	0	NEGATIVE	0.9994602799415588

gft_predict will be discussed in more detail below. gft_predict reads from stdin and applies almost any input to almost any model. gft_predict supports most of the tasks in HuggingFace pipelines.

Here are some examples of inference (see for more details):

# text-classification: sentiment analysis
echo 'I love you.' | gft_predict --task H:text-classification
# I love you.	POSITIVE	0.9998705387115479

# text-classification: emotion classification
model=H:AdapterHub/bert-base-uncased-pf-emotion
echo 'I love you.' | gft_predict --model $model --task H:text-classification
# I love you.	love	0.6005669236183167

# token-classification: NER (Named Entity Recognition)
echo 'I love New York.' | gft_predict --task H:token-classification
# I love New York.	New/I-LOC:0.9989	York/I-LOC:0.9974

# fill-mask: guess the masked word
echo 'I <mask> you.' | gft_predict --task H:fill-mask
# I <mask> you.	 salute|0.241	 miss|0.177	 love|0.147	 thank|0.060	 applaud|0.047

# text-generation 
echo 'I love ' | gft_predict --task H:text-generation 
# I love you and I will never be forgotten and thank you." I was also
# inspired by all of the students who walked onto campus wearing these
#  teddy I love the idea that you can be anything people ask for you

# translation
echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-fr
# I love you.	Je t'aime.
echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-zh
I love you.	我爱你

# Run a half-dozen fake-news classifiers on  "I love you."
lab=$gft/huggingface_hub/huggingface_models_with_labels.txt
for model in `egrep fake $lab | cut -f1`
     do
     out=`echo 'I love you.' | gft_predict --task H:text-classification --model $model 2>/dev/null`
     echo $model $out
     done
# elozano/bert-base-cased-fake-news I love you. Fake 0.9996728897094727
# Narrativaai/fake-news-detection-spanish I love you. FAKE 0.9591125845909119
# dtam/autonlp-covid-fake-news-36839110 I love you. 1 0.999913215637207
# Qinghui/autonlp-fake-covid-news-36769078 I love you. 1 0.9999946355819702
# Qiaozhen/fake-news-detector I love you. fake 0.9575745463371277
# yaoyinnan/roberta-fakeddit I love you. Fake 0.9857746362686157

# Run a half-dozen sentiment classifiers on  "I love you."
lab=$gft/huggingface_hub/huggingface_models_with_labels.txt
# Since there are so many (196) classifiers, take a random sample of 6
egrep -ci positive $lab
# 196
models=`awk 'NF < 5' $lab | egrep -i positive  | 
	awk '{print rand() "\t" $0}' | sort | cut -f2- | sed 6q | cut -f1`
for model in $models
     do
     out=`echo 'I love you.' | gft_predict --task H:text-classification --model $model 2>/dev/null`
     echo $model $out
     done
# rohansingh/autonlp-Fake-news-detection-system-29906863 I love you. positive 0.6512816548347473
# gchhablani/fnet-base-finetuned-sst2 I love you. positive 0.9974162578582764
# cointegrated/rubert-tiny-sentiment-balanced I love you. positive 0.9445993304252625
# SetFit/deberta-v3-large__sst2__train-8-5 I love you. positive 0.8264811635017395
# bowipawan/bert-sentimental I love you. positive 0.7457774877548218
# m3tafl0ps/autonlp-NLPIsFun-251844 I love you. positive 0.9010641574859619

Predicted Label	Score	Model	Labels for Model
I love you is positive
positive	0.512	SetFit/deberta-v3-large__sst2__train-16-7	negative, positive
POSITIVE	0.871	ayameRushia/roberta-base-indonesian-sentiment-analysis-smsa	POSITIVE, NEUTRAL, NEGATIVE
positive	0.807	SetFit/distilbert-base-uncased__sst2__train-32-2	negative, positive
positive	0.999	AdapterHub/bert-base-uncased-pf-sst2	negative, positive
positive	0.917	SetFit/deberta-v3-large__sst2__train-32-1	negative, positive
positive	0.999	moshew/tiny-bert-aug-sst2-distilled	negative, positive
positive	0.651	rohansingh/autonlp-Fake-news-detection-system-29906863	negative, positive
5 stars	0.872	tomato/sentiment_analysis	1 star, 2 stars, 3 stars, 4 stars, 5 stars
5 stars	0.424	cmarkea/distilcamembert-base-sentiment	1 star, 2 stars, 3 stars, 4 stars, 5 stars
5 stars	0.872	nlptown/bert-base-multilingual-uncased-sentiment	1 star, 2 stars, 3 stars, 4 stars, 5 stars

Predicted Label	Score	Model	Labels for Model
I love you is love and/or joy
joy	0.826	philschmid/deberta-v3-xsmall-emotion	anger, fear, joy, love, sadness, surprise
love	0.681	AdapterHub/roberta-base-pf-emotion	sadness, joy, love, anger, fear, surprise
joy	0.786	philschmid/MiniLMv2-L6-H384-emotion	sadness, joy, love, anger, fear, surprise
love	0.649	bhadresh-savani/roberta-base-emotion	sadness, joy, love, anger, fear, surprise
love	0.935	bhadresh-savani/albert-base-v2-emotion	anger, fear, joy, love, sadness, surprise
love	0.960	marcelcastrobr/sagemaker-distilbert-emotion-2	sadness, joy, love, anger, fear, surprise

Predicted Label	Score	Model	Labels for Model
I love you is fake news
Fake	0.998	yaoyinnan/bert-base-chinese-covid19	Neutral, Fake, Real
Fake	0.986	yaoyinnan/roberta-fakeddit	Fake, Real
fake	0.958	Qiaozhen/fake-news-detector	real, fake
FAKE	0.959	Narrativaai/fake-news-detection-spanish	REAL, FAKE

Predicted Label	Score	Model	Labels for Model
I love you is both spam and ham
spam	0.826	SetFit/distilbert-base-uncased__enron_spam__all-train	ham, spam
not spam	1.000	sureshs/distilbert-large-sms-spam	not spam, spam

Predicted Label	Score	Model	Labels for Model
I love you is (mostly) not hateful/offensive
not-hate	0.974	aXhyra/demo_hate_1234567	not-hate, hate
neither	0.349	SetFit/distilbert-base-uncased__hate_speech_offensive__train-16-9	hate speech, offensive language, neither
not-hate	0.990	aXhyra/presentation_hate_31415	not-hate, hate
no hate speech	0.885	SetFit/distilbert-base-uncased__ethos_binary__all-train	no hate speech, hate speech
not-hate	0.995	aXhyra/hate_trained_42	not-hate, hate
hateful	0.040	pysentimiento/robertuito-hate-speech	hateful, targeted, aggressive
offsenive language	0.336	SetFit/distilbert-base-uncased__hate_speech_offensive__train-32-1	hate speech, offensive language, neither
offensive	1.000	simjo/model1_test	not offensive, offensive
OFFENSIVE	0.546	seanbenhur/tanglish-offensive-language-identification	NOT-OFFENSIVE, OFFENSIVE
neither	0.365	SetFit/distilbert-base-uncased__hate_speech_offensive__train-8-6	hate speech, offensive language, neither
offensive language	1.000	simjo/dummy-model	not offensive, offensive
hate speech	0.350	SetFit/distilbert-base-uncased__hate_speech_offensive__train-8-7	hate speech, offensive language, neither

Design goals/benefits of higher level languages such as gft:

Hide complexity: gft programs should be short (1-line) and easy to read.
Avoid special cases (especially in code that is exposed to users): Standard examples such as these and these and longer than gft programs. In many cases, 500 lines of pytorch code can be reduced to a single line of gft code. These 500 lines of code contain many details that users do not need to know about such as data loading, gradient descent training, and much more. Many of these examples are very similar to one another. Avoid duplication in code that is exposed to large numbers of users.
Code re-use: The standard examples are full of opportunities for code reuse. The user is expected to fork the code in these examples and modify them as needed if they want the examples to work on slightly different tasks, or slightly different datasets. When users modify the 500 lines of code, there will introduce bugs. Code reuse is safer than editing examples. Since the gft tools are based closely on these examples, they should produce similar results, with similiar computational resources (space and time), since both solutions are basically running the same algorithms (and much of the same code).
Flexibility/Generality: Support most datasets and models published on hubs (HuggingFace, PaddleNLP). The prefixes, H, P and C, refer to HuggingFace, PaddleNLP and custom (local filesystem). You should be able to mix and match models and datasets from different sources (HuggingFace, PaddleHub/PaddleNLP, Adapter Hub, etc.) There are currently about 30k models and 3k datasets on these hubs. gft hides complexities such as different formats for models from different suppliers, and different types of auto classes for different purposes. For example, users should not need to know about adapter models, and how they are different from other types of models.

Inference (with more details)

gft_predict reads from stdin and applies almost any input to almost any model. See documentation on HuggingFace pipelines ~~and PaddleNLP taskflow~~ for more information on --task argument.

Example of usage of inference scripts:

text-classification : The left hand side (lhs) of the equation is a single variable over classes.
token-classification : The lhs has a class variable for each token.
translation : Machine Translation
fill-mask : Replace "" with words.
question-answering : Example: SQuAD. The answer is a span (substring) of the input. The lhs has two class variables for each position, indicating the start and end of answer spans.
image-classification: Like text-classification, except the rhs is a picture (as opposed to text).
automatic-speech-recognition: ASR
text-generation: Input prompt and output completion.

Text Classification

# text classification

# example with --task argument (HuggingFace pipelines do different things with different task arguments)
echo 'I love you.' | gft_predict --model H:AdapterHub/bert-base-uncased-pf-emotion --task H:text-classification 2>/dev/null
# I love you.	love	0.6005669236183167

If you don't specify a model, one will be chosen for you (remove the /dev/null bits to see that distilbert-base-uncased-finetuned-sst-2-english is the default model). Different models produce different classifications. The default model produces positive and negative labels (sentiment).

echo 'I love you.' | gft_predict --task H:text-classification 2>/dev/null
# I love you.	POSITIVE	0.9998705387115479
echo 'I hate you.' | gft_predict --task H:text-classification 2>/dev/null
# I hate you.	NEGATIVE	0.9992952346801758

If you don't specify a --task, the class labels will be numeric, and the last field will be a list of logits. The class label is the argmax of the logits.

# default arguments: input is assigned to class 2 of 6 (number of classes is part of the model which was fine-tuned on data with 6 classes)
echo 'I love you.' | gft_predict --model H:AdapterHub/bert-base-uncased-pf-emotion 2>/dev/null
# I love you.	2	-0.2438915|4.8194537|5.235088|-1.7891347|-4.2359033|-5.1401916

echo 'I love you.' | gft_predict --model H:distilbert-base-uncased-finetuned-sst-2-english
# I love you.	1	-4.294976|4.6575847

echo 'I love you.' | gft_predict --model H:distilbert-base-uncased-finetuned-sst-2-english
# I hate you	0	3.8723779|-3.1543205

Token Classification

echo 'I love New York.' | gft_predict --task H:token-classification --model vblagoje/bert-english-uncased-finetuned-pos 2>/dev/null
# I love New York.	i/PRON:0.9995	love/VERB:0.9989	new/PROPN:0.9986	york/PROPN:0.9988	./PUNCT:0.9997

echo 'I love New York.' | gft_predict --task H:token-classification 2>/dev/null
# I love New York.	New/I-LOC:0.9989	York/I-LOC:0.9974

Machine Translation

```sh # more examples of --task argument

Machine Translation

Language pair is encoded in the model; there are models in HuggingFace for many language pairs

echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-fr 2>/dev/null

I love you. Je t'aime.

echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-zh 2>/dev/null I love you. 我爱你


<h2 id="fill"> Fill Mask </h2>
```sh
# fill mask: replace <mask> with n-best words
echo 'I <mask> you.' | gft_predict --task H:fill-mask 2>/dev/null
# I <mask> you.	 salute|0.241	 miss|0.177	 love|0.147	 thank|0.060	 applaud|0.047

Question Answering

# Question Answering (SQuAD)
# Extract one example from the SQuAD dataset
gft_dataset --data H:squad --eqn 'classify_spans: answers ~ question + context'  --split val | sed 1q > /tmp/x

# Run inference on this example (and show the first 150 characters of each field on separate lines)
gft_predict --task H:question-answering --model H:$model < /tmp/x | tr '\t' '\n' | cut -c1-150
# Which NFL team represented the AFC at Super Bowl 50?|Super Bowl 50 was an American football game to determine the champion of the National Football Le
# {'text': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos'], 'answer_start': [177, 177, 177]}
# answer: Denver Broncos

Image Classification

# image classification
echo https://images.all-free-download.com/images/graphicwebp/funny_cat_194619.webp |
gft_predict --task H:image-classification 2>/dev/null
# https://images.all-free-download.com/images/graphicwebp/funny_cat_194619.webp	Egyptian cat|0.736	tiger cat|0.039	tabby, tabby cat|0.031	lynx, catamount|0.024	Persian cat|0.023

echo https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg | 
gft_predict --task H:image-classification 2>/dev/null
# https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg	lynx, catamount|0.433	cougar, puma, catamount, mountain lion, painter, panther, Felis concolor|0.035	snow leopard, ounce, Panthera uncia|0.032	Egyptian cat|0.024	tiger cat|0.023

gft_dataset --eqn 'classify: labels ~ file' --data H:nateraw/auto-cats-and-dogs --split train | head > /tmp/x
cat /tmp/x | gft_predict --task H:image-classification 2>/dev/null | awk -F/ '{print $NF}'
# 0.jpg	0	Egyptian cat|0.327	tiger cat|0.097	tabby, tabby cat|0.057	space heater|0.053	laptop, laptop computer|0.029
# 1.jpg	0	tabby, tabby cat|0.612	Egyptian cat|0.284	tiger cat|0.094	lynx, catamount|0.003	Siamese cat, Siamese|0.000
# 10.jpg	0	tabby, tabby cat|0.435	Egyptian cat|0.251	tiger cat|0.085	Persian cat|0.081	lynx, catamount|0.060
# 100.jpg	0	Egyptian cat|0.441	tabby, tabby cat|0.396	tiger cat|0.086	lynx, catamount|0.022	Persian cat|0.012
# 1000.jpg	0	Egyptian cat|0.369	tabby, tabby cat|0.129	tiger cat|0.107	Angora, Angora rabbit|0.053	Persian cat|0.042
# 10000.jpg	0	Egyptian cat|0.753	tabby, tabby cat|0.203	tiger cat|0.040	lynx, catamount|0.002	Siamese cat, Siamese|0.000
# 10001.jpg	0	Egyptian cat|0.321	tabby, tabby cat|0.061	Persian cat|0.053	tiger cat|0.033	bucket, pail|0.025
# 10002.jpg	0	Egyptian cat|0.581	tabby, tabby cat|0.222	tiger cat|0.096	lynx, catamount|0.044	Persian cat|0.019
# 10003.jpg	0	Egyptian cat|0.765	tabby, tabby cat|0.118	tiger cat|0.084	Siamese cat, Siamese|0.002	carton|0.002
# 10004.jpg	0	tabby, tabby cat|0.476	Egyptian cat|0.298	tiger cat|0.218	lynx, catamount|0.001	Siamese cat, Siamese|0.000

The results are much better if we replace the default model with a more appropriate model.

model=nateraw/vit-base-cats-vs-dogs
cat /tmp/x | gft_predict --model=$model --task H:image-classification | awk -F/ '{print $NF}'
# 0.jpg	0	cat|0.999	dog|0.001
# 1.jpg	0	cat|1.000	dog|0.000
# 10.jpg	0	cat|1.000	dog|0.000
# 100.jpg	0	cat|1.000	dog|0.000
# 1000.jpg	0	cat|1.000	dog|0.000
# 10000.jpg	0	cat|1.000	dog|0.000
# 10001.jpg	0	cat|1.000	dog|0.000
# 10002.jpg	0	cat|1.000	dog|0.000
# 10003.jpg	0	cat|0.999	dog|0.001
# 10004.jpg	0	cat|1.000	dog|0.000

gft_dataset --eqn 'classify: labels ~ image_file_path' --data H:beans | head  > /tmp/x
cat /tmp/x | gft_predict --task H:image-classification 2>/dev/null | awk -F/ '{print $NF}'
# healthy_test.21.jpg	2	fig|0.696	cucumber, cuke|0.013	pot, flowerpot|0.009	custard apple|0.007	leaf beetle, chrysomelid|0.005
# healthy_test.35.jpg	2	bell pepper|0.094	leaf beetle, chrysomelid|0.065	cucumber, cuke|0.058	head cabbage|0.049	ant, emmet, pismire|0.022
# healthy_test.34.jpg	2	cucumber, cuke|0.156	head cabbage|0.074	pot, flowerpot|0.022	ear, spike, capitulum|0.021	corn|0.010
# healthy_test.20.jpg	2	fig|0.848	pot, flowerpot|0.005	custard apple|0.003	jackfruit, jak, jack|0.002	cucumber, cuke|0.002
# healthy_test.36.jpg	2	custard apple|0.072	pot, flowerpot|0.057	fig|0.033	wool, woolen, woollen|0.025	necklace|0.013
# healthy_test.22.jpg	2	pick, plectrum, plectron|0.030	shower cap|0.012	leaf beetle, chrysomelid|0.011	head cabbage|0.008	spatula|0.007
# healthy_test.23.jpg	2	leaf beetle, chrysomelid|0.172	cucumber, cuke|0.049	ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle|0.043	corn|0.031	bell pepper|0.029
# healthy_test.37.jpg	2	cucumber, cuke|0.235	head cabbage|0.021	zucchini, courgette|0.015	fig|0.014	corn|0.012
# healthy_test.8.jpg	2	leaf beetle, chrysomelid|0.051	cucumber, cuke|0.045	head cabbage|0.023	ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle|0.010	fig|0.009
# healthy_test.33.jpg	2	cucumber, cuke|0.152	leaf beetle, chrysomelid|0.111	lacewing, lacewing fly|0.031	zucchini, courgette|0.030	fig|0.022

Speech Recognition

# speech recognition
gft_dataset --eqn 'ctc: text ~ file' --data H:timit_asr | head  > /tmp/x
cat /tmp/x | gft_predict --task H:automatic-speech-recognition 2>/dev/null | awk -F/ '{print $NF}'
# SX139.WAV	The bungalow was pleasantly situated near the shore.	THE BUNGALOW WAS PLEASANTLY SITUATED NEAR THE SHORE
# SA2.WAV	Don't ask me to carry an oily rag like that.	DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
# SX229.WAV	Are you looking for employment?	ARE YOU LOOKING FOR EMPLOYMENT
# SA1.WAV	She had your dark suit in greasy wash water all year.	SHE HAD YOUR DARK SUIT AND GREASY WASHWATER ALL YEAR
# SX49.WAV	At twilight on the twelfth day we'll have Chablis.	AT TWILIGHT ON THE TWELFTH DAY WE'LL HAVE CHABLI
# SX409.WAV	Eating spinach nightly increases strength miraculously.	EATING SPINACH NIGHTLY INCREASES STRENGTH MIRACULOUSLY
# SI1759.WAV	Got a heck of a buy on this, dirt cheap.	GOT A HECK OF A BY ON THIS DIRT CHEAP
# SI499.WAV	The scalloped edge is particularly appealing.	THE SCALLOPED EDGE IS PARTICULARLY APPEALING
# SX319.WAV	A big goat idly ambled through the farmyard.	A BIG GOAT IDLY AMBLED THROUGH THE FARMYARD
# SI1129.WAV	This group is secularist and their program tends to be technological.	THIS GROUP IS SECULARIST AND THEIR PROGRAMM TENDS TO BE TECHNOLOGICAL

Text Generation

# text generation
echo 'A robin is a' | gft_predict --task H:text-generation --max_length 15 --num_return_sequences 1 2>/dev/null
# A robin is a	A robin is a cat or dog that has trouble keeping up with its

# NOTE: non-determinism; same prompt --> different completions
echo 'A robin is a' | gft_predict --task H:text-generation --max_length 15 --num_return_sequences 1 2>/dev/null
# A robin is a	A robin is a small bird that runs at a range where its legs

More examples of inference are here. Lots of examples on GLUE are here.

Fine-Tuning

Example of usage (of fine-tuning):

export datasets=$gft/datasets
outdir=/tmp/cola/cpkt
sh $gft/examples/fine_tuning_examples/model.HuggingFace/language/data.HuggingFace/glue/cola.sh $outdir

All of the shell scripts under fine_tuning_examples take a single argument (a directory for the results).

The shell scripts under model.HuggingFace use models from HuggingFace, and shell scripts under model.PaddleHub use models from PaddleHub and/or PaddleNLP. Similarly, shell scripts under data.HuggingFace use datasets from HuggingFace, and shell scripts under data.PaddleHub use datasets from PaddleHub and/or PaddleNLP.

To run all fine-tuning examples:

# run all examples
cd $gft/examples/fine_tuning_examples

find . -name '*.sh' |
while read f
do
b=$checkpoints/`dirname $f`/`basename $f .sh`
sh $f $b/ckpt
done

Citations, Documentation, etc.

Paper (draft) is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gft (general fine-tuning): A Little Language for Deepnets

Installation

Introduction: Pre-Training, Fine-Tuning and Inference

Fine-Tuning Equations

Datasets and Models: Embarrassement of Riches

gft_dataset: Output Dataset on stdout

Labels

gft Equations

Inference: gft_predict

Inference (with more details)

Text Classification

Token Classification

Machine Translation

Machine Translation

Language pair is encoded in the model; there are models in HuggingFace for many language pairs

I love you. Je t'aime.

Question Answering

Image Classification

Speech Recognition

Text Generation

Fine-Tuning

Citations, Documentation, etc.

FilesExpand file tree

README.old.md

Latest commit

History

README.old.md

File metadata and controls

gft (general fine-tuning): A Little Language for Deepnets

Installation

Introduction: Pre-Training, Fine-Tuning and Inference

Fine-Tuning Equations

Datasets and Models: Embarrassement of Riches

gft_dataset: Output Dataset on stdout

Labels

gft Equations

Inference: gft_predict

Inference (with more details)

Text Classification

Token Classification

Machine Translation

Machine Translation

Language pair is encoded in the model; there are models in HuggingFace for many language pairs

I love you. Je t'aime.

Question Answering

Image Classification

Speech Recognition

Text Generation

Fine-Tuning

Citations, Documentation, etc.