LLM SFT & Evaluation Scripts

The llm_sft/ directory contains scripts for supervised fine-tuning (SFT) data curation

Main Scripts

llm_sft/answer_eval.py: To get the initial responses
llm_sft/reflection_eval.py: Use advanced MLLM to generate reflections based on initial responses to help the policy model to learn both reasoning and self-reflection.
llm_sft/image_description.py: Extract image descriptions from datasets for unimodal (o3-mini).

Arguments

All scripts support a rich set of command-line arguments. Here are the most important ones:

llm_sft/answer_eval.py

Arguments:

--model (str, default: DEPLOYMENT): LLM model name or deployment name.
--model_type (str, default: "remote"): LLM backend type, either remote or local.
--platform (str, default: "VLLM"): LLM platform, e.g., VLLM, OpenAI, Azure, etc.
--input_path (str, required): Path to the input dataset (JSONL format), e.g., cot100k.
--image_dir (str, optional): Directory containing images for multimodal input.
--image_url (str, optional, default: None): Accessible URL prefix for images (if needed by backend).
--batch_size (int, default: 4): Batch size for concurrent LLM calls.
--concurrent_tasks (int, default: 4): Number of concurrent batches to process.
--checkpoint_interval (int, default: 5): Number of new results before saving a checkpoint.
--incorrect_prefix (str, default: data/checkpoint/incorrect_): Prefix for checkpoint files of incorrect results.
--correct_prefix (str, default: data/checkpoint/correct_): Prefix for checkpoint files of correct results.
--incorrect_final (str, default: data/incorrect_final.jsonl): Final output file for all incorrect results.
--correct_final (str, default: data/correct_final.jsonl): Final output file for all correct results.

llm_sft/reflection_eval.py

Arguments:

--model (str, default: DEPLOYMENT): LLM model name or deployment name.
--model_type (str, default: "remote"): LLM backend type, either remote or local.
--upload_image (bool, optional): Set to True to upload images to a public image host or storage (e.g., SM.MS, Imgur, S3, R2, etc.). Requires implementation of upload_image_and_get_url in utils/upload_utils.py.
--image_description_path (str, optional): Path to the image description file, needed if you do not want to use the multimodal input.
--platform (str, default: "VLLM"): LLM platform, e.g., VLLM, OpenAI, Azure, etc.
--input_path (str, required): Path to the initial response.
--image_dir (str, optional): Directory containing images for multimodal input.
--image_url (str, optional, default: None): Accessible URL prefix for images if you have already uploaded them to a server.
--batch_size (int, default: 4): Batch size for concurrent LLM calls.
--concurrent_tasks (int, default: 4): Number of concurrent batches to process.
--checkpoint_interval (int, default: 10): Number of new results before saving a checkpoint.
--reflection_prefix (str, default: data/checkpoint/reflection_): Prefix for checkpoint files of reflection results.
--reflection_final (str, default: data/reflection_final.jsonl): Final output file for all reflection results.

llm_sft/image_description.py

Arguments:

--source (str, choices: mulberry, cot100k): Data source name (uses default pattern).
--input_path (str, optional): Path to your prepared data (overrides source default).
--pattern (str, optional): Custom regex pattern to extract descriptions.
--output_path (str, required): Output path for image_description.jsonl.
--content_key (str, default: content): Key name in JSON object where text is stored.

Example Usage

Answer Evaluation:

bash
python -m llm_sft.answer_eval \
    --model Qwen/Qwen2.5-VL-7B-Instruct \
    --model_type remote \
    --platform VLLM \
    --input_path /path/to/your/data.jsonl \
    --image_dir /path/to/your/images

Reflection Evaluation:

bash
python -m llm_sft.reflection_eval \
    --model Qwen/Qwen2.5-VL-7B-Instruct \
    --model_type remote \
    --platform VLLM \
    --input_path /path/to/your/data.jsonl \
    --image_dir /path/to/your/images \
    --output_path /path/to/save/reflections.jsonl

Image Description Extraction:

bash
python -m llm_sft.image_description \
    --input_path /path/to/your/data.jsonl \
    --source cot100k \
    --output_path /path/to/save/image_descriptions.jsonl

Tips

For OpenAI/Azure, images are encoded as base64 by default; for VLLM, use file paths or URLs.
You can customize or extend the scripts for your own data or workflow.
For more details, see the code and docstrings in each script, or run with --help.

← Back to Docs