1-line programs for fine-tuning and inference
Clone this repo and then say:
pip install -r requirements/requirements.txtMake sure gft_predict is in your PATH. Then set up the following environment variables with:
g=`which gft_predict`
export gft=`dirname $g`Unfortunately, there are a number of incompatibilities between adapters, paddlespeech and the latest version of HuggingFace transformers. There are a number of versions of requirements.txt the requirements directory. We recommend setting up several different virtual environments to work around some of these incompatibilities.
The scripts in the examples directory will create results under $gft_checkpoints. Please set that variable to some place where you have plenty of free disk space. The results are large because most fine-tuning examples copy a pre-trained model. Given there are many dozens of such examples, there will be many dozens of copies of large models.
WARNING: Some of the fine-tuning scripts take a long time, and not all examples are working (yet).
The table below shows a 3-step recipe, which has become standard in the literature on deep nets.
| Step | gft Support | Description | Time | Hardware |
|---|---|---|---|---|
| 1 | Pre-Training | Days/Weeks | Large GPU Cluster | |
| 2 | gft_fit | Fine-Tuning | Hours/Days | 1+ GPUs |
| 3 | gft_predict | Inference | Seconds/Minutes | 0+ GPUs |
This repo provides support for step 2 (gft_fit) and step 3 (gft_predict). Most gft_fit and gft_predict programs are short (1-line), much shorter than examples such as these, which are typically a few hundred lines of python. With gft, users should not need to read or modify any python code for steps 2 and 3 in the table above.
Step 1, pre-training, is beyond the scope of this work. We recommend starting with models from HuggingFace and PaddleHub/PaddleNLP hubs, as illustrated in the examples below.
Most gft programs are short (1-liners). While gft supports most arguments in most HuggingFace and PaddleNLP examples, most gft programs specify 4 arguments:
gft_fit --model H:bert-base-cased \
--data H:glue,qqp \
--metric H:glue,qqp \
--output_dir $outdir \
--eqn 'classify: label ~ question1 + question2'This gft program fine-tunes a pretrained model (bert-base-cased from HuggingFace) with a dataset (the qqp subset of glue from HuggingFace).
One of the design goals of gft is to make fine-tuning as accessible to a broad audience as possible. It should be as easy to fine-tune a deep net as it is to fit a regression model.
gft equations are similar to glm (general linear models) equations in regression packages such as in R.
Another design goal is to make it easy to port from one supplier to another. The gft program below is similar to the one above, except this one uses data from PaddleNLP instead of HuggingFace. With gft, it should be possible to mix and match datasets and models from different suppliers.
The variables in the equations refer to columns in the datasets. The equations are slightly different in gft programs above and below because different suppliers use different names for columns.
gft_fit --model H:bert-base-cased \
--data P:glue,qqp \
--metric H:glue,qqp \
--output_dir $outdir \
--eqn 'classify: labels ~ sentence1 + sentence2'As mentioned above, there are about 30k models and 3k datasets in HuggingFace, and there are more in PaddleHub/PaddleNLP and AdapterHub. Going forward, there will be even more. These numbers are about 3x larger than they were a year ago. How can we find the good stuff?
Some datasets are more popular than others (top 1000):
Top 50 HuggingFace datasets by two metrics (both if member of both sets):
- # of models: glue, super_glue, common_voice, librispeech_asr, wikitext, squad, imdb, blimp, paws, wmt16, tweet_eval, trec, red_caps, xnli, ag_news, tweets_hate_speech_detection, squad_v2, stsb_multi_mt, cfq, wikiann, anli, conll2003, snli, schema_guided_dstc8, hellaswag, xsum, code_search_net, winogrande, cnn_dailymail, tab_fact, piqa, adversarial_qa, banking77, race, amazon_polarity, emotion, americas_nli, hans, daily_dialog, wino_bias, oscar, amazon_reviews_multi, klue, mc4, universal_dependencies, rotten_tomatoes, samsum, cosmos_qa, c4, xtreme
- # of downloads: wikipedia, common_voice, bookcorpus, c4, glue, squad, conll2003, oscar, librispeech_asr, tweet_eval, squad_v2, emotion, masakhaner, cnn_dailymail, amazon_reviews_multi, imdb, wmt16, mc4, xsum, superb, universal_dependencies, multi_nli, xtreme, vctk, covost2, wikiann, samsum, openslr, timit_asr, xnli, snli, multilingual_librispeech, wmt14, klue, gigaword, clinc_oos, cc100, wmt19, natural_questions, nli_tr, mlsum, code_search_net, bigscience/P3, kde4, xquad, wikisql, race, go_emotions, conll2002, wnut_17
HuggingFace provides tools to make it easier to work with large numbers of models and datasets. There are some short python programs in $gft/huggingface_hub that use these tools to list models and datasets (and useful cominbations of the two). There are also some text files that were created by these python programs:
There is also a python program, models_for_dataset.py, that outputs a list of models for a particular dataset. On average, there are about 10x models for each dataset, though obviously, some datasets are more popular than others. Here are a couple of models for a few of the more popular datasets.
python $gft/huggingface_hub/models_for_dataset.py common_voice | sed 3q
# dataset: common_voice --> 524 models
common_voice facebook/wav2vec2-large-xlsr-53
common_voice facebook/wav2vec2-xls-r-300m
python $gft/huggingface_hub/models_for_dataset.py glue | sed 3q
# dataset: glue --> 188 models
glue Alireza1044/albert-base-v2-sst2
glue DeepPavlov/xlm-roberta-large-en-ru-mnli
python $gft/huggingface_hub/models_for_dataset.py emotion | sed 3q
# dataset: emotion --> 59 models
emotion bhadresh-savani/distilbert-base-uncased-emotion
emotion nateraw/bert-base-uncased-emotionHere are some possibly useful links for finding datasets and models.
| HuggingFace | PaddleHub/PaddleNLP | AdapterHub | |
|---|---|---|---|
| Datasets | text file explorer | datasets | |
| Models | large text file explorer | models | small text file explorer |
The program, gft_dataset, makes it easy to look at datasets from different suppliers. The following example downloads the qqp subset of glue from HuggingFace and PaddleNLP. The "2>/dev/null" removes messages sent to stderr. Piping the results to "sed 1q" terminates output after the first line (for expository convenience). Note that the HuggingFace version (H:) names the fields: question1, question2, label, and the PaddleNLP version (P:) names the fields: sentence1, sentence2, labels.
# Go to Huggingface, and output the val split of the qqp task in glue to stdout
gft_dataset --data 'H:glue,qqp' --split val 2>/dev/null | sed 1q
question1|Why are African-Americans so beautiful? question2|Why are hispanics so beautiful? label|0 idx|0
# Same as above, but replace HuggingFace with PaddleNLP
gft_dataset --data 'P:glue,qqp' --split val 2>/dev/null | sed 1q
sentence1|Why are African-Americans so beautiful? sentence2|Why are hispanics so beautiful? labels|0If the optional eqn argument is provided, then gft_dataset uses the equation to extract the appropriate fields. The first column of the output corresponds to the rhs (right hand side) of the equation, and the second column corresponds to the lhs (left hand side) of the equation.
# Same as above, but use the equation to select fields of interest
gft_dataset --eqn 'classify: label ~ question1 + question2' --data 'H:glue,qqp' --split val 2>/dev/null | sed 1q
Why are African-Americans so beautiful?|Why are hispanics so beautiful? 0The following example illustrates a custom dataset (C:), where the data are in csv files on the load filesystem. Normally, the lhs of a regression is a single real value, but in this case, it is a vector in R^3 .
# without eqn arg
gft_dataset --data "C:$gft/datasets/VAD/VAD" --split val 2>/dev/null | sed 3q
# Word|abandonment Valence|0.128 Arousal|0.43 Dominance|0.202
# Word|abbey Valence|0.58 Arousal|0.367 Dominance|0.444
# Word|abbreviation Valence|0.469 Arousal|0.306 Dominance|0.345
# with eqn arg
gft_dataset --eqn 'regress: Valence + Arousal + Dominance ~ Word' --data "C:$gft/datasets/VAD/VAD" --split val 2>/dev/null | sed 3q
# abandonment 0.128|0.43|0.202
# abbey 0.58|0.367|0.444
# abbreviation 0.469|0.306|0.345The following example shows that gft_dataset can also be applied to speech datasets. Common voice is available in English (en), Chinese (zh-CN), as well as a number of other choices. The raw data includes the waveform as an array, but with the eqn argument, we can extract a few useful fields such as the filename and the transcription.
# without eqn arg
gft_dataset --data H:common_voice,en 2>/dev/null | sed 1q | tr '\t' '\n'
client_id|a07b17f8234ded5e847443ea6f423cef745cbbc7537fb637d58326000aa751e829a21c4fd0a35fc17fb833aa7e95ebafce5e...
# path|common_voice_en_100363.mp3
# audio|{'path': 'cv-corpus-6.1-2020-12-11/en/clips/common_voice_en_100363.mp3', 'array': array([0.0000000e+...
# sentence|It was the time of day when all of Spain slept during the summer.
# up_votes|2
# down_votes|1
# age|
# gender|
# accent|
# locale|en
# segment|''
# with eqn arg (English)
gft_dataset --eqn 'ctc:sentence ~ path' --data H:common_voice,en 2>/dev/null | sed 3q
# common_voice_en_100363.mp3 It was the time of day when all of Spain slept during the summer.
# common_voice_en_100540.mp3 Same way you did.
# common_voice_en_100546.mp3 Sarah told him that she was there to see her brother.
# with eqn arg (Chinese)
gft_dataset --eqn 'ctc:sentence ~ path' --data H:common_voice,zh-CN 2>/dev/null | sed 3q
# common_voice_zh-CN_18524189.mp3 正巧母亲往外探头
# common_voice_zh-CN_18532640.mp3 至今为止,元气火箭总共发行了两张专辑。
# common_voice_zh-CN_18532644.mp3 失业率降到十七年来的新低点gft_labels.py outputs the set of labels for datasets and/or models.
./gft_labels.py --data H:emotion 2>/dev/null
H:emotion sadness joy love anger fear surprise./gft_labels.py --model H:AdapterHub/bert-base-uncased-pf-emotion 2>/dev/null
H:AdapterHub/bert-base-uncased-pf-emotion sadness joy love anger fear surprise./gft_labels.py --task image-classification --model H:nateraw/vit-base-cats-vs-dogs 2>/dev/null
H:nateraw/vit-base-cats-vs-dogs cat dog# The default model for text-classification has 2 classes
gft_labels.py --model H:distilbert-base-uncased-finetuned-sst-2-english
# H:distilbert-base-uncased-finetuned-sst-2-english NEGATIVE POSITIVE# The default model for image-classification has 1k classes
gft_labels.py --model H:google/vit-base-patch16-224 --task image-classification |
tr '\t' '\n' | wc -l
# 1000
gft_labels.py --model H:google/vit-base-patch16-224 --task image-classification | tr '\t' '\n' | head
# Afghan hound, Afghan
# African chameleon, Chamaeleo chamaeleon
# African crocodile, Nile crocodile, Crocodylus niloticus
# African elephant, Loxodonta africana
# African grey, African gray, Psittacus erithacus
# African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
# Airedale, Airedale terrier
# American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier
# American alligator, Alligator mississipiensisNote, gft_labels.py uses a set of heuristics. Sometimes, these heuristics fail to find the names of the labels.
See $gft/huggingface_hub/huggingface_models_with_labels.txt for labels for about 1000 text-classifier models.
The examples above (and below) make use of equations such as:
--eqn 'classify: label ~ question1 + question2'The keyword, classify, will be distinguished from other keywords such as regress, classify_tokens, classify_spans, ctc, etc. In the classification case, for each input (pairs of two questions), there is a single label (semantically similar or not). Classify also generalizes from binary classification to multiclass classification (for tasks such as emotion classification). As shown in the table below, equations start with a number of different keywords:
- classify: lhs denotes a set of classes
- regress : lhs denotes a point in a vector space
- classify_tokens : there is a classification task for each token on the rhs
- classify_spans : used for SQuAD-like tasks where the output should be a span (substring) of the rhs
- ctc: used in speech recognition where the input is audio and the output is text
There are a number of examples of equations in the table below.
| Dataset | Subset | Data Argument | Equation | Pipeline Task | |
|---|---|---|---|---|---|
| GLUE | COLA | H:glue,cola | classify : label ~ sentence | text-classification | |
| GLUE | SST2 | H:glue,cola | classify : label ~ sentence | text-classification | |
| GLUE | WNLI | H:glue,cola | classify : label ~ sentence | text-classification | |
| GLUE | MRPC | H:glue,cola | classify : label ~ sentence1 + sentence2 | text-classification | |
| GLUE | QNLI | H:glue,cola | classify : label ~ sentence1 + sentence2 | text-classification | |
| GLUE | QQP | H:glue,cola | classify : label ~ question + sentence | text-classification | |
| GLUE | SSTB | H:glue,cola | regress : label ~ question1 + question2 | ||
| GLUE | MNLI | H:glue,cola | classify : label ~ premise + hypothesis | text-classification | |
| SQuAD 1.0 | H:squad | classify_spans : answers ~ question + context | question-answering | ||
| SQuAD 2.0 | H:squad_v2 | classify_spans : answers ~ question + context | question-answering | ||
| CONLL2003 | POS | H:conll2003 | classify_tokens : pos_tags ~ tokens | token-classification | |
| CONLL2003 | NER | H:conll2003 | classify_tokens : ner_tags ~ tokens | token-classification | |
| CONLL2003 | Chunking | H:conll2003 | classify_tokens : chunk_tags ~ tokens | token-classification | |
| TIMIT | H:timit_asr | ctc: text ~ audio | automatic-speech-recognition | ||
| LibriSpeech | H:librispeech_asr | ctc: text ~ audio | automatic-speech-recognition | ||
| Amazon Reviews | H:amazon_reviews_multi | classify: stars ~ review_title + review_body | text-classification | ||
| VAD | C:$gft/datasets/VAD/VAD | regress: Valence + Arousal + Dominance ~ Word |
gft_predict reads input from stdin, and writes output predicts to stdout. If one wants to run a model on data from a standard dataset, then pipe gft_dataset into gft_predict.
# extract examples from dataset (and stop after 2nd one, for expository purposes)
gft_dataset --data H:emotion --eqn 'classify: label ~ text' 2>/dev/null | sed 2q
# im feeling rather rotten so im not very ambitious right now 0
# im updating my blog because i feel shitty 0
# same as above, but pipe results into gft_predict
# gft_predict appends predicted labels and scores to each input line
gft_dataset --data H:emotion --eqn 'classify: label ~ text' 2>/dev/null | sed 2q |
gft_predict --task H:text-classification 2>/dev/null
# im feeling rather rotten so im not very ambitious right now 0 NEGATIVE 0.9998108744621277
# im updating my blog because i feel shitty 0 NEGATIVE 0.9994602799415588gft_predict will be discussed in more detail below. gft_predict reads from stdin and applies almost any input to almost any model. gft_predict supports most of the tasks in HuggingFace pipelines.
Here are some examples of inference (see for more details):
# text-classification: sentiment analysis
echo 'I love you.' | gft_predict --task H:text-classification
# I love you. POSITIVE 0.9998705387115479
# text-classification: emotion classification
model=H:AdapterHub/bert-base-uncased-pf-emotion
echo 'I love you.' | gft_predict --model $model --task H:text-classification
# I love you. love 0.6005669236183167
# token-classification: NER (Named Entity Recognition)
echo 'I love New York.' | gft_predict --task H:token-classification
# I love New York. New/I-LOC:0.9989 York/I-LOC:0.9974
# fill-mask: guess the masked word
echo 'I <mask> you.' | gft_predict --task H:fill-mask
# I <mask> you. salute|0.241 miss|0.177 love|0.147 thank|0.060 applaud|0.047
# text-generation
echo 'I love ' | gft_predict --task H:text-generation
# I love you and I will never be forgotten and thank you." I was also
# inspired by all of the students who walked onto campus wearing these
# teddy I love the idea that you can be anything people ask for you
# translation
echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-fr
# I love you. Je t'aime.
echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-zh
I love you. 我爱你# Run a half-dozen fake-news classifiers on "I love you."
lab=$gft/huggingface_hub/huggingface_models_with_labels.txt
for model in `egrep fake $lab | cut -f1`
do
out=`echo 'I love you.' | gft_predict --task H:text-classification --model $model 2>/dev/null`
echo $model $out
done
# elozano/bert-base-cased-fake-news I love you. Fake 0.9996728897094727
# Narrativaai/fake-news-detection-spanish I love you. FAKE 0.9591125845909119
# dtam/autonlp-covid-fake-news-36839110 I love you. 1 0.999913215637207
# Qinghui/autonlp-fake-covid-news-36769078 I love you. 1 0.9999946355819702
# Qiaozhen/fake-news-detector I love you. fake 0.9575745463371277
# yaoyinnan/roberta-fakeddit I love you. Fake 0.9857746362686157# Run a half-dozen sentiment classifiers on "I love you."
lab=$gft/huggingface_hub/huggingface_models_with_labels.txt
# Since there are so many (196) classifiers, take a random sample of 6
egrep -ci positive $lab
# 196
models=`awk 'NF < 5' $lab | egrep -i positive |
awk '{print rand() "\t" $0}' | sort | cut -f2- | sed 6q | cut -f1`
for model in $models
do
out=`echo 'I love you.' | gft_predict --task H:text-classification --model $model 2>/dev/null`
echo $model $out
done
# rohansingh/autonlp-Fake-news-detection-system-29906863 I love you. positive 0.6512816548347473
# gchhablani/fnet-base-finetuned-sst2 I love you. positive 0.9974162578582764
# cointegrated/rubert-tiny-sentiment-balanced I love you. positive 0.9445993304252625
# SetFit/deberta-v3-large__sst2__train-8-5 I love you. positive 0.8264811635017395
# bowipawan/bert-sentimental I love you. positive 0.7457774877548218
# m3tafl0ps/autonlp-NLPIsFun-251844 I love you. positive 0.9010641574859619| I love you is positive | |||
| Predicted Label | Score | Model | Labels for Model |
|---|---|---|---|
| positive | 0.512 | SetFit/deberta-v3-large__sst2__train-16-7 | negative, positive |
| POSITIVE | 0.871 | ayameRushia/roberta-base-indonesian-sentiment-analysis-smsa | POSITIVE, NEUTRAL, NEGATIVE |
| positive | 0.807 | SetFit/distilbert-base-uncased__sst2__train-32-2 | negative, positive |
| positive | 0.999 | AdapterHub/bert-base-uncased-pf-sst2 | negative, positive |
| positive | 0.917 | SetFit/deberta-v3-large__sst2__train-32-1 | negative, positive |
| positive | 0.999 | moshew/tiny-bert-aug-sst2-distilled | negative, positive |
| positive | 0.651 | rohansingh/autonlp-Fake-news-detection-system-29906863 | negative, positive |
| 5 stars | 0.872 | tomato/sentiment_analysis | 1 star, 2 stars, 3 stars, 4 stars, 5 stars |
| 5 stars | 0.424 | cmarkea/distilcamembert-base-sentiment | 1 star, 2 stars, 3 stars, 4 stars, 5 stars |
| 5 stars | 0.872 | nlptown/bert-base-multilingual-uncased-sentiment | 1 star, 2 stars, 3 stars, 4 stars, 5 stars |
| I love you is love and/or joy | |||
| Predicted Label | Score | Model | Labels for Model |
|---|---|---|---|
| joy | 0.826 | philschmid/deberta-v3-xsmall-emotion | anger, fear, joy, love, sadness, surprise |
| love | 0.681 | AdapterHub/roberta-base-pf-emotion | sadness, joy, love, anger, fear, surprise |
| joy | 0.786 | philschmid/MiniLMv2-L6-H384-emotion | sadness, joy, love, anger, fear, surprise |
| love | 0.649 | bhadresh-savani/roberta-base-emotion | sadness, joy, love, anger, fear, surprise |
| love | 0.935 | bhadresh-savani/albert-base-v2-emotion | anger, fear, joy, love, sadness, surprise |
| love | 0.960 | marcelcastrobr/sagemaker-distilbert-emotion-2 | sadness, joy, love, anger, fear, surprise |
| I love you is fake news | |||
| Predicted Label | Score | Model | Labels for Model |
|---|---|---|---|
| Fake | 0.998 | yaoyinnan/bert-base-chinese-covid19 | Neutral, Fake, Real |
| Fake | 0.986 | yaoyinnan/roberta-fakeddit | Fake, Real |
| fake | 0.958 | Qiaozhen/fake-news-detector | real, fake |
| FAKE | 0.959 | Narrativaai/fake-news-detection-spanish | REAL, FAKE |
| I love you is both spam and ham | |||
| Predicted Label | Score | Model | Labels for Model |
|---|---|---|---|
| spam | 0.826 | SetFit/distilbert-base-uncased__enron_spam__all-train | ham, spam |
| not spam | 1.000 | sureshs/distilbert-large-sms-spam | not spam, spam |
| I love you is (mostly) not hateful/offensive | |||
| Predicted Label | Score | Model | Labels for Model |
|---|---|---|---|
| not-hate | 0.974 | aXhyra/demo_hate_1234567 | not-hate, hate |
| neither | 0.349 | SetFit/distilbert-base-uncased__hate_speech_offensive__train-16-9 | hate speech, offensive language, neither |
| not-hate | 0.990 | aXhyra/presentation_hate_31415 | not-hate, hate |
| no hate speech | 0.885 | SetFit/distilbert-base-uncased__ethos_binary__all-train | no hate speech, hate speech |
| not-hate | 0.995 | aXhyra/hate_trained_42 | not-hate, hate |
| hateful | 0.040 | pysentimiento/robertuito-hate-speech | hateful, targeted, aggressive |
| offsenive language | 0.336 | SetFit/distilbert-base-uncased__hate_speech_offensive__train-32-1 | hate speech, offensive language, neither |
| offensive | 1.000 | simjo/model1_test | not offensive, offensive |
| OFFENSIVE | 0.546 | seanbenhur/tanglish-offensive-language-identification | NOT-OFFENSIVE, OFFENSIVE |
| neither | 0.365 | SetFit/distilbert-base-uncased__hate_speech_offensive__train-8-6 | hate speech, offensive language, neither |
| offensive language | 1.000 | simjo/dummy-model | not offensive, offensive |
| hate speech | 0.350 | SetFit/distilbert-base-uncased__hate_speech_offensive__train-8-7 | hate speech, offensive language, neither |
Design goals/benefits of higher level languages such as gft:
- Hide complexity: gft programs should be short (1-line) and easy to read.
- Avoid special cases (especially in code that is exposed to users): Standard examples such as these and these and longer than gft programs. In many cases, 500 lines of pytorch code can be reduced to a single line of gft code. These 500 lines of code contain many details that users do not need to know about such as data loading, gradient descent training, and much more. Many of these examples are very similar to one another. Avoid duplication in code that is exposed to large numbers of users.
- Code re-use: The standard examples are full of opportunities for code reuse. The user is expected to fork the code in these examples and modify them as needed if they want the examples to work on slightly different tasks, or slightly different datasets. When users modify the 500 lines of code, there will introduce bugs. Code reuse is safer than editing examples. Since the gft tools are based closely on these examples, they should produce similar results, with similiar computational resources (space and time), since both solutions are basically running the same algorithms (and much of the same code).
- Flexibility/Generality: Support most datasets and models published on hubs (HuggingFace, PaddleNLP). The prefixes, H, P and C, refer to HuggingFace, PaddleNLP and custom (local filesystem). You should be able to mix and match models and datasets from different sources (HuggingFace, PaddleHub/PaddleNLP, Adapter Hub, etc.) There are currently about 30k models and 3k datasets on these hubs. gft hides complexities such as different formats for models from different suppliers, and different types of auto classes for different purposes. For example, users should not need to know about adapter models, and how they are different from other types of models.
gft_predict reads from stdin and applies almost any input to almost any model.
See documentation on HuggingFace pipelines
and PaddleNLP taskflow
for more information on --task argument.
Example of usage of inference scripts:
- text-classification : The left hand side (lhs) of the equation is a single variable over classes.
- token-classification : The lhs has a class variable for each token.
- translation : Machine Translation
- fill-mask : Replace "" with words.
- question-answering : Example: SQuAD. The answer is a span (substring) of the input. The lhs has two class variables for each position, indicating the start and end of answer spans.
- image-classification: Like text-classification, except the rhs is a picture (as opposed to text).
- automatic-speech-recognition: ASR
- text-generation: Input prompt and output completion.
# text classification
# example with --task argument (HuggingFace pipelines do different things with different task arguments)
echo 'I love you.' | gft_predict --model H:AdapterHub/bert-base-uncased-pf-emotion --task H:text-classification 2>/dev/null
# I love you. love 0.6005669236183167If you don't specify a model, one will be chosen for you (remove the /dev/null bits to see that distilbert-base-uncased-finetuned-sst-2-english is the default model). Different models produce different classifications. The default model produces positive and negative labels (sentiment).
echo 'I love you.' | gft_predict --task H:text-classification 2>/dev/null
# I love you. POSITIVE 0.9998705387115479
echo 'I hate you.' | gft_predict --task H:text-classification 2>/dev/null
# I hate you. NEGATIVE 0.9992952346801758If you don't specify a --task, the class labels will be numeric, and the last field will be a list of logits. The class label is the argmax of the logits.
# default arguments: input is assigned to class 2 of 6 (number of classes is part of the model which was fine-tuned on data with 6 classes)
echo 'I love you.' | gft_predict --model H:AdapterHub/bert-base-uncased-pf-emotion 2>/dev/null
# I love you. 2 -0.2438915|4.8194537|5.235088|-1.7891347|-4.2359033|-5.1401916
echo 'I love you.' | gft_predict --model H:distilbert-base-uncased-finetuned-sst-2-english
# I love you. 1 -4.294976|4.6575847
echo 'I love you.' | gft_predict --model H:distilbert-base-uncased-finetuned-sst-2-english
# I hate you 0 3.8723779|-3.1543205echo 'I love New York.' | gft_predict --task H:token-classification --model vblagoje/bert-english-uncased-finetuned-pos 2>/dev/null
# I love New York. i/PRON:0.9995 love/VERB:0.9989 new/PROPN:0.9986 york/PROPN:0.9988 ./PUNCT:0.9997
echo 'I love New York.' | gft_predict --task H:token-classification 2>/dev/null
# I love New York. New/I-LOC:0.9989 York/I-LOC:0.9974echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-fr 2>/dev/null
echo 'I love you.' | gft_predict --task H:translation --model H:Helsinki-NLP/opus-mt-en-zh 2>/dev/null I love you. 我爱你
<h2 id="fill"> Fill Mask </h2>
```sh
# fill mask: replace <mask> with n-best words
echo 'I <mask> you.' | gft_predict --task H:fill-mask 2>/dev/null
# I <mask> you. salute|0.241 miss|0.177 love|0.147 thank|0.060 applaud|0.047
# Question Answering (SQuAD)
# Extract one example from the SQuAD dataset
gft_dataset --data H:squad --eqn 'classify_spans: answers ~ question + context' --split val | sed 1q > /tmp/x
# Run inference on this example (and show the first 150 characters of each field on separate lines)
gft_predict --task H:question-answering --model H:$model < /tmp/x | tr '\t' '\n' | cut -c1-150
# Which NFL team represented the AFC at Super Bowl 50?|Super Bowl 50 was an American football game to determine the champion of the National Football Le
# {'text': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos'], 'answer_start': [177, 177, 177]}
# answer: Denver Broncos
# image classification
echo https://images.all-free-download.com/images/graphicwebp/funny_cat_194619.webp |
gft_predict --task H:image-classification 2>/dev/null
# https://images.all-free-download.com/images/graphicwebp/funny_cat_194619.webp Egyptian cat|0.736 tiger cat|0.039 tabby, tabby cat|0.031 lynx, catamount|0.024 Persian cat|0.023
echo https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg |
gft_predict --task H:image-classification 2>/dev/null
# https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg lynx, catamount|0.433 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor|0.035 snow leopard, ounce, Panthera uncia|0.032 Egyptian cat|0.024 tiger cat|0.023gft_dataset --eqn 'classify: labels ~ file' --data H:nateraw/auto-cats-and-dogs --split train | head > /tmp/x
cat /tmp/x | gft_predict --task H:image-classification 2>/dev/null | awk -F/ '{print $NF}'
# 0.jpg 0 Egyptian cat|0.327 tiger cat|0.097 tabby, tabby cat|0.057 space heater|0.053 laptop, laptop computer|0.029
# 1.jpg 0 tabby, tabby cat|0.612 Egyptian cat|0.284 tiger cat|0.094 lynx, catamount|0.003 Siamese cat, Siamese|0.000
# 10.jpg 0 tabby, tabby cat|0.435 Egyptian cat|0.251 tiger cat|0.085 Persian cat|0.081 lynx, catamount|0.060
# 100.jpg 0 Egyptian cat|0.441 tabby, tabby cat|0.396 tiger cat|0.086 lynx, catamount|0.022 Persian cat|0.012
# 1000.jpg 0 Egyptian cat|0.369 tabby, tabby cat|0.129 tiger cat|0.107 Angora, Angora rabbit|0.053 Persian cat|0.042
# 10000.jpg 0 Egyptian cat|0.753 tabby, tabby cat|0.203 tiger cat|0.040 lynx, catamount|0.002 Siamese cat, Siamese|0.000
# 10001.jpg 0 Egyptian cat|0.321 tabby, tabby cat|0.061 Persian cat|0.053 tiger cat|0.033 bucket, pail|0.025
# 10002.jpg 0 Egyptian cat|0.581 tabby, tabby cat|0.222 tiger cat|0.096 lynx, catamount|0.044 Persian cat|0.019
# 10003.jpg 0 Egyptian cat|0.765 tabby, tabby cat|0.118 tiger cat|0.084 Siamese cat, Siamese|0.002 carton|0.002
# 10004.jpg 0 tabby, tabby cat|0.476 Egyptian cat|0.298 tiger cat|0.218 lynx, catamount|0.001 Siamese cat, Siamese|0.000The results are much better if we replace the default model with a more appropriate model.
model=nateraw/vit-base-cats-vs-dogs
cat /tmp/x | gft_predict --model=$model --task H:image-classification | awk -F/ '{print $NF}'
# 0.jpg 0 cat|0.999 dog|0.001
# 1.jpg 0 cat|1.000 dog|0.000
# 10.jpg 0 cat|1.000 dog|0.000
# 100.jpg 0 cat|1.000 dog|0.000
# 1000.jpg 0 cat|1.000 dog|0.000
# 10000.jpg 0 cat|1.000 dog|0.000
# 10001.jpg 0 cat|1.000 dog|0.000
# 10002.jpg 0 cat|1.000 dog|0.000
# 10003.jpg 0 cat|0.999 dog|0.001
# 10004.jpg 0 cat|1.000 dog|0.000gft_dataset --eqn 'classify: labels ~ image_file_path' --data H:beans | head > /tmp/x
cat /tmp/x | gft_predict --task H:image-classification 2>/dev/null | awk -F/ '{print $NF}'
# healthy_test.21.jpg 2 fig|0.696 cucumber, cuke|0.013 pot, flowerpot|0.009 custard apple|0.007 leaf beetle, chrysomelid|0.005
# healthy_test.35.jpg 2 bell pepper|0.094 leaf beetle, chrysomelid|0.065 cucumber, cuke|0.058 head cabbage|0.049 ant, emmet, pismire|0.022
# healthy_test.34.jpg 2 cucumber, cuke|0.156 head cabbage|0.074 pot, flowerpot|0.022 ear, spike, capitulum|0.021 corn|0.010
# healthy_test.20.jpg 2 fig|0.848 pot, flowerpot|0.005 custard apple|0.003 jackfruit, jak, jack|0.002 cucumber, cuke|0.002
# healthy_test.36.jpg 2 custard apple|0.072 pot, flowerpot|0.057 fig|0.033 wool, woolen, woollen|0.025 necklace|0.013
# healthy_test.22.jpg 2 pick, plectrum, plectron|0.030 shower cap|0.012 leaf beetle, chrysomelid|0.011 head cabbage|0.008 spatula|0.007
# healthy_test.23.jpg 2 leaf beetle, chrysomelid|0.172 cucumber, cuke|0.049 ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle|0.043 corn|0.031 bell pepper|0.029
# healthy_test.37.jpg 2 cucumber, cuke|0.235 head cabbage|0.021 zucchini, courgette|0.015 fig|0.014 corn|0.012
# healthy_test.8.jpg 2 leaf beetle, chrysomelid|0.051 cucumber, cuke|0.045 head cabbage|0.023 ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle|0.010 fig|0.009
# healthy_test.33.jpg 2 cucumber, cuke|0.152 leaf beetle, chrysomelid|0.111 lacewing, lacewing fly|0.031 zucchini, courgette|0.030 fig|0.022# speech recognition
gft_dataset --eqn 'ctc: text ~ file' --data H:timit_asr | head > /tmp/x
cat /tmp/x | gft_predict --task H:automatic-speech-recognition 2>/dev/null | awk -F/ '{print $NF}'
# SX139.WAV The bungalow was pleasantly situated near the shore. THE BUNGALOW WAS PLEASANTLY SITUATED NEAR THE SHORE
# SA2.WAV Don't ask me to carry an oily rag like that. DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
# SX229.WAV Are you looking for employment? ARE YOU LOOKING FOR EMPLOYMENT
# SA1.WAV She had your dark suit in greasy wash water all year. SHE HAD YOUR DARK SUIT AND GREASY WASHWATER ALL YEAR
# SX49.WAV At twilight on the twelfth day we'll have Chablis. AT TWILIGHT ON THE TWELFTH DAY WE'LL HAVE CHABLI
# SX409.WAV Eating spinach nightly increases strength miraculously. EATING SPINACH NIGHTLY INCREASES STRENGTH MIRACULOUSLY
# SI1759.WAV Got a heck of a buy on this, dirt cheap. GOT A HECK OF A BY ON THIS DIRT CHEAP
# SI499.WAV The scalloped edge is particularly appealing. THE SCALLOPED EDGE IS PARTICULARLY APPEALING
# SX319.WAV A big goat idly ambled through the farmyard. A BIG GOAT IDLY AMBLED THROUGH THE FARMYARD
# SI1129.WAV This group is secularist and their program tends to be technological. THIS GROUP IS SECULARIST AND THEIR PROGRAMM TENDS TO BE TECHNOLOGICAL# text generation
echo 'A robin is a' | gft_predict --task H:text-generation --max_length 15 --num_return_sequences 1 2>/dev/null
# A robin is a A robin is a cat or dog that has trouble keeping up with its
# NOTE: non-determinism; same prompt --> different completions
echo 'A robin is a' | gft_predict --task H:text-generation --max_length 15 --num_return_sequences 1 2>/dev/null
# A robin is a A robin is a small bird that runs at a range where its legs
More examples of inference are here. Lots of examples on GLUE are here.
Example of usage (of fine-tuning):export datasets=$gft/datasets
outdir=/tmp/cola/cpkt
sh $gft/examples/fine_tuning_examples/model.HuggingFace/language/data.HuggingFace/glue/cola.sh $outdirAll of the shell scripts under fine_tuning_examples take a single argument (a directory for the results).
The shell scripts under model.HuggingFace use models from HuggingFace, and shell scripts under model.PaddleHub use models from PaddleHub and/or PaddleNLP. Similarly, shell scripts under data.HuggingFace use datasets from HuggingFace, and shell scripts under data.PaddleHub use datasets from PaddleHub and/or PaddleNLP.
To run all fine-tuning examples:
# run all examples
cd $gft/examples/fine_tuning_examples
find . -name '*.sh' |
while read f
do
b=$checkpoints/`dirname $f`/`basename $f .sh`
sh $f $b/ckpt
done
Paper (draft) is here.






















