phase2

Phase-2

Here we provide the overall outline on how Airscape's phase-2 training is carried out and how it should be used.

🔧 Installation

Enter each subfolders to see the exact dependencies needed.

➰ Self-play loop

Our idea is to use rejection sampling and iteratively use the self-play training loop to let the model we get in phase-1 automatically improved based on the guidance from a MoE teacher.

1. Prompts generation

Use mature VLM to generate 8 prompts based on the given frame. Then these data will be used in model inference.

Open prompts_generate to see more details.

2. Inference(based on Phase 1 model)

Use phase-1 airscape to get outcomes based on different prompts and seeds, which boosts diversity that allows for foreseeable capbability for evolution.

The details can be found in inference, which are basically same as the phase-1 code. You can also open phase1 to learn more about the trainin and usage.

3. Discriminator

This discriminator acts as a MoE teacher that leads the model to get stronger.

Open best_selection to see more details.

Name		Name	Last commit message	Last commit date
parent directory ..
best_selection		best_selection
inference		inference
prompts_generate		prompts_generate
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Phase-2

🔧 Installation

➰ Self-play loop

1. Prompts generation

2. Inference(based on Phase 1 model)

3. Discriminator

FilesExpand file tree

phase2

Directory actions

More options

Directory actions

More options

Latest commit

History

phase2

Folders and files

parent directory

README.md

Phase-2

🔧 Installation

➰ Self-play loop

1. Prompts generation

2. Inference(based on Phase 1 model)

3. Discriminator