目录SFTTrainer官方案例1、环境准备2、模型加载数据集预处理3、Trainer配置训练4、推理演示SFTTrainerLoRA案例训练代码解决OOM问题out of memory合并推理参数高效微调QloRA编码推理合并参数高效微调之Unsloth训练合并分布式训练Accelerate使用说明运行脚本推理LLaMA Factory安装WebUI数据处理训练配置测试合并导出vLLM模型部署SFTTrainer官方案例1、环境准备Auto算力云上租一个4090要选可扩容的然后打开JupyterLab在上面操作。创建一个notebook找到数据盘的路径因为可扩容且空间大在终端进入数据盘创建hf文件把家目录也就是模型要存放的地方改成hf现在就是下模型从hf-mirror.com里下缓存模型到刚创建的hf中。2、模型加载数据集预处理在终端中pip install trl datasets加载model和Tokenizer检查一下已经加载成功了新建data文件夹上传一下traintest数据集数据处理检查一下是否成功3、Trainer配置训练创建目录AutoPanel中有日志存放位置初始化config和trainer并执行tokenizer开始处理数据接下来可以开始训练进入目录可以查看保存最佳模型并查看4、推理演示做推理之前要先关闭内核就是停掉训模型那个文件把显存释放出来。然后去huggingface的model card上找代码这里是做的测试然后可以加载我们之前训好的best_model/root/autodl-tmp/sft/Qwen3-0.6B/sft-full/bestSFTTrainerLoRA案例先装一下peft这个是参数高效微调库。训练代码loraconfig的代码去hugging face官网的learn里有SFTTrainer的LoRA案例%env HF_ENDPOINThttps://hf-mirror.com %env HF_HOME/root/autodl-tmp/hf # 加载model和Tokenizer from transformers import AutoModelForCausalLM,AutoTokenizer model_name Qwen/Qwen3-8B model AutoModelForCausalLM.from_pretrained(model_name) tokenizer AutoTokenizer.from_pretrained(model_name) from datasets import load_dataset dataset_dict load_dataset(json,data_files {train:data/keywords_data_train.jsonl,test:data/keywords_data_test.jsonl}) def map_func(example): conversation example[conversation] messages [] for item in conversation: messages.append({role:user,content:item[human]}) messages.append({role:assistant,content:item[assistant]}) return {messages:messages} dataset_dict dataset_dict.map(map_func,batched False,remove_columns [dataset,conversation,category,conversation_id]) from trl import SFTConfig,SFTTrainer from peft import LoraConfig # TODO: Configure LoRA parameters # r: rank dimension for LoRA update matrices (smaller more compression) rank_dimension 4 # lora_alpha: scaling factor for LoRA layers (higher stronger adaptation) lora_alpha 8 # lora_dropout: dropout probability for LoRA layers (helps prevent overfitting) lora_dropout 0.05 # lora_config peft_config LoraConfig( rrank_dimension, # Rank dimension - typically between 4-32 lora_alphalora_alpha, # LoRA scaling factor - typically 2x rank lora_dropoutlora_dropout, # Dropout probability for LoRA layers biasnone, # Bias type for LoRA. the corresponding biases will be updated during training. target_modulesall-linear, # Which modules to apply LoRA to task_typeCAUSAL_LM, # Task type for model architecture ) # Configure trainer training_args SFTConfig( output_dir/root/autodl-tmp/sft/Qwen3-8B/sft-lora, max_steps1000, per_device_train_batch_size4, learning_rate5e-5, logging_steps10, logging_dir /root/tf-logs, save_steps100, save_total_limit 2, eval_strategysteps, eval_steps100, load_best_model_at_end True, bf16 True, warmup_steps 50, ) # Initialize trainer trainer SFTTrainer( modelmodel, argstraining_args, train_datasetdataset_dict[train], eval_datasetdataset_dict[test], processing_classtokenizer, # 新增参数 peft_config peft_config ) trainer.train() trainer.save_model(/root/autodl-tmp/sft/Qwen3-8B/sft-lora/best)解决OOM问题out of memory可以设置dtype小技巧一秒钟监听一次smi合并推理保存adaptor合并到W0中保存完整模型%env HF_ENDPOINThttps://hf-mirror.com %env HF_HOME/root/autodl-tmp/hf from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch model_name Qwen/Qwen3-8B # load the tokenizer and the model tokenizer AutoTokenizer.from_pretrained(model_name) base_model AutoModelForCausalLM.from_pretrained( model_name, dtype torch.float16, ) # Load the PEFT model with adapter peft_model PeftModel.from_pretrained( base_model, /root/autodl-tmp/sft/Qwen3-8B/sft-lora/best, dtypetorch.float16 ) merged_model peft_model.merge_and_unload() # prepare the model input prompt 抽取出文本中的关键词\n标题人工神经网络在猕猴桃种类识别上的应用\n文本在猕猴桃介电特性研究的基础上,将人工神经网络技术应用于猕猴桃的种类识别.该种类识别属于模式识别,其关键在于提取样品的特征参数,在获得特征参数的基础上,选取合适的网络通过训练来进行识别.猕猴桃种类识别的研究为自动化识别果品的种类、品种和新鲜等级等提供了一种新方法,为进一步研究果品介电特性与其内在品质的关系提供了一定的理论与实践基础. messages [ {role: user, content: prompt} ] text tokenizer.apply_chat_template( messages, tokenizeFalse, add_generation_promptTrue, enable_thinkingTrue # Switches between thinking and non-thinking modes. Default is True. ) print(text) model_inputs tokenizer([text], return_tensorspt).to(merged_model.device) # conduct text completion generated_ids merged_model.generate( **model_inputs, max_new_tokens32768 ) # 截掉和输入相同的内容 output_ids generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content try: # rindex finding 151668 (/think) index len(output_ids) - output_ids[::-1].index(151668) except ValueError: index 0 thinking_content tokenizer.decode(output_ids[:index], skip_special_tokensTrue).strip(\n) content tokenizer.decode(output_ids[index:], skip_special_tokensTrue).strip(\n) print(thinking content:, thinking_content) print(content:, content) merged_model.save_pretrained(/root/autodl-tmp/sft/Qwen3-8B/sft-lora/merged) tokenizer.save_pretrained(/root/autodl-tmp/sft/Qwen3-8B/sft-lora/merged)参数高效微调QloRA编码装一下量化的库。核心代码在lora基础上变动# 加载model和Tokenizerfrom transformers import AutoModelForCausalLM,AutoTokenizer# 加一个量化操作import torchfrom transformers import BitsAndBytesConfigfrom peft import prepare_model_for_kbit_trainingconfig BitsAndBytesConfig(load_in_4bitTrue,bnb_4bit_quant_typenf4,bnb_4bit_use_double_quantTrue,bnb_4bit_compute_dtypetorch.bfloat16,)model_name Qwen/Qwen3-8Bmodel AutoModelForCausalLM.from_pretrained(model_name,quantization_configconfig)# 模型准备工作model prepare_model_for_kbit_training(model)tokenizer AutoTokenizer.from_pretrained(model_name)可以查看nvidia-smi看看显存情况推理合并我们用QloRA训出来的adaptor最后合并的时候还是要与基础模型合并而不是量化后的模型。代码和LoRA的合并推理一样就是把目录改一改。参数高效微调之Unsloth官网https://unsloth.ai/这个相当于是独立的用法在hugging face的TRL库里也可也找到。训练先装一下unsloth对比lora具体改动就是前面的创建模型还有后面trainer里peft_config的参数要删掉后面的loraconfig也可以删掉。前面需要在数据处理的时候重新map一个formatting_func代码去unsloth官网上找https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/datasets-guide。%env HF_ENDPOINThttps://hf-mirror.com %env HF_HOME/root/autodl-tmp/hf # 加载model和Tokenizer from transformers import AutoModelForCausalLM,AutoTokenizer from unsloth import FastLanguageModel model_name Qwen/Qwen3-8B max_length 2048 # Supports automatic RoPE Scaling, so choose any number # Load model model, tokenizer FastLanguageModel.from_pretrained( model_namemodel_name, max_seq_lengthmax_length, dtypeNone, # For auto-detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere load_in_4bitTrue, # Use 4bit quantization to reduce memory usage. Can be False ) # Do model patching and add fast LoRA weights model FastLanguageModel.get_peft_model( model, r16, target_modules[ q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, ], lora_alpha16, lora_dropout0, # Dropout 0 is currently optimized biasnone, # Bias none is currently optimized use_gradient_checkpointingTrue, random_state3407, ) from datasets import load_dataset from unsloth.chat_templates import get_chat_template dataset_dict load_dataset(json,data_files {train:data/keywords_data_train.jsonl,test:data/keywords_data_test.jsonl}) # 将数据转为标准对话格式OpenAI def map_func(example): conversation example[conversation] messages [] for item in conversation: messages.append({role:user,content:item[human]}) messages.append({role:assistant,content:item[assistant]}) return {messages:messages} dataset_dict dataset_dict.map(map_func,batched False,remove_columns [dataset,conversation,category,conversation_id]) # 将对话格式数据转为字符串chat template tokenizer get_chat_template( tokenizer, chat_template qwen3, # change this to the right chat_template name ) def formatting_prompts_func(examples): convos examples[messages] texts [tokenizer.apply_chat_template(convo, tokenize False, add_generation_prompt False) for convo in convos] return {text:texts} dataset_dict dataset_dict.map(formatting_prompts_func,batched True,remove_columns [messages]) from trl import SFTConfig,SFTTrainer # Configure trainer training_args SFTConfig( output_dir/root/autodl-tmp/sft/Qwen3-8B/sft-unsloth, max_steps1000, per_device_train_batch_size4, learning_rate5e-5, logging_steps10, logging_dir /root/tf-logs, save_steps100, save_total_limit 2, eval_strategysteps, eval_steps100, load_best_model_at_end True, bf16 True, warmup_steps 50, ) # Initialize trainer trainer SFTTrainer( modelmodel, argstraining_args, train_datasetdataset_dict[train], eval_datasetdataset_dict[test], processing_classtokenizer, ) trainer.train() trainer.save_model(/root/autodl-tmp/sft/Qwen3-8B/sft-unsloth/best)这边虽然model_name是Qwen3-8B但是还是unsloth还是会自己下可能是量化。合并model.save_pretrained_merged(/root/autodl-tmp/sft/Qwen3-8B/sft-unsloth/merged, tokenizer, save_method merged_16bit,)分布式训练Accelerate使用说明https://hf-mirror.com/docs/trl/deepspeed_integrationhttps://hf-mirror.com/docs/accelerate这边GPU数量选多卡运行脚本先和之前一样写一个全参微调的train.py文件参数配置要先pip accelerate和deepspeed然后他会通过问答的方式配置常用参数。资源充足从ZeRO1开始不足从3开始。config配置完毕之后运行。推理推理和之前的全参微调推理差不多可以加一个流式输出以下是使用案例。from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer tokenizer AutoTokenizer.from_pretrained(openai-community/gpt2) model AutoModelForCausalLM.from_pretrained(openai-community/gpt2) inputs tokenizer([The secret to baking a good cake is ], return_tensorspt) streamer TextStreamer(tokenizer) _ model.generate(**inputs, streamerstreamer, max_new_tokens20)LLaMA Factoryhttps://llamafactory.readthedocs.io/en/latest/进到官网找到installation。这个好像也行但是没试过pip install -e .[torch,metrics]安装Auto算力云里有一个学术加速可以帮助快速下载。如果出现环境冲突问题可以conda创建一个新环境。WebUI下载一个MobaXterm进入tunnelling添加新的SSH tunnel。左边相当于是自己的电脑右边的下面是租的服务器上面是LLaMA Factory可以这么理解。我们的要求是通过SSH客户端访问LF。创建好了之后启动并输入密码。访问http://localhost:7860/这个数据集是我们创建的llama factory下的data也就是/root/autodl-tmp/LLaMA-Factory/data要把我们自己的数据集放进去。数据处理先在data下面生成一个keywords.jsonl的文件这个文件是在root/data下的然后把这个文件移到llama factory的data下面。from datasets import load_dataset dataset_dict load_dataset(json,data_files {train:data/keywords_data_train.jsonl,test:data/keywords_data_test.jsonl}) def map_func(example): conversation example[conversation] messages [] for item in conversation: messages.append({role:user,content:item[human]}) messages.append({role:assistant,content:item[assistant]}) return {messages:messages} dataset_dict dataset_dict.map(map_func,batched False,remove_columns [dataset,conversation,category,conversation_id]) dataset_dict[train].to_json(data/keywords.jsonl)修改这里的file_name填我们的名字然后给数据集起个名字具体代码见这是llama factory的openai格式下的文档。https://llamafactory.readthedocs.io/en/latest/getting_started/data_preparation.html#openai配好之后就可以在页面上找到我们的数据集。训练配置在打开UI界面之前设置一下环境变量这个是镜像环境和允许设置额外参数。训练参数配好了可以保存下次可以加载。开始训练测试合并导出llama factory下面有一个saves一直沿着找到保存的目录输出目录和llama factory中的输出目录显示是一样的。在检查点选择我们训好的adaptor这样他能自己识别然后导出模型。vLLM模型部署day6 9