MMRotate训练遥感目标检测模型:从数据裁剪到模型测试的完整配置清单(附代码)
MMRotate遥感目标检测实战从数据预处理到模型调优的全流程指南遥感图像中的目标检测一直是计算机视觉领域的重要研究方向。与传统水平框检测不同旋转目标检测能够更精确地定位和识别任意方向排列的物体在卫星图像分析、城市规划、农业监测等领域具有广泛应用价值。本文将基于MMRotate框架详细介绍如何从原始遥感数据出发构建完整的旋转目标检测流程。1. 环境配置与数据准备1.1 系统环境搭建MMRotate作为OpenMMLab生态系统的一部分对运行环境有特定要求。以下是推荐的配置方案# 创建并激活conda环境 conda create -n mmrotate python3.8 -y conda activate mmrotate # 安装PyTorch与CUDA根据显卡驱动选择版本 conda install pytorch1.10.0 torchvision0.11.0 cudatoolkit11.3 -c pytorch # 安装MMCV和MMDetection pip install mmcv-full1.4.7 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/index.html pip install mmdet2.25.0 # 安装MMRotate git clone https://github.com/open-mmlab/mmrotate.git cd mmrotate pip install -r requirements/build.txt pip install -v -e .提示安装过程中常见问题包括CUDA版本不匹配、MMCV与PyTorch版本冲突等。建议严格按照官方文档的版本对应关系进行配置。1.2 数据集格式转换遥感数据通常以非标准格式存储需要转换为DOTA格式才能被MMRotate处理。典型转换流程包括原始标注转换将VOC/COCO格式标注转为DOTA格式图像格式统一确保所有图像为PNG格式坐标系统一确认旋转角度表示方法le90或oc以下是一个将roLabelImg生成的XML标注转为DOTA格式的Python脚本核心部分def xml_to_dota(xml_path, img_path, output_dir): tree ET.parse(xml_path) root tree.getroot() with open(os.path.join(output_dir, label.txt), w) as f: for obj in root.findall(object): robndbox obj.find(robndbox) cx float(robndbox.find(cx).text) cy float(robndbox.find(cy).text) w float(robndbox.find(w).text) h float(robndbox.find(h).text) angle float(robndbox.find(angle).text) # 计算旋转后的四个顶点坐标 points calculate_rotated_points(cx, cy, w, h, angle) line .join([str(p) for p in points]) obj.find(name).text 0\n f.write(line)2. 数据预处理与增强策略2.1 图像裁剪与分块大尺寸遥感图像通常需要分割以适应GPU内存限制。MMRotate提供了内置的裁剪工具可通过修改split_configs中的JSON文件进行配置{ image_spliter: { type: sliding_window, window_size: [1024, 1024], stride: [512, 512], padding: false, save_dir: split_results }, data_root: /path/to/your/data, ann_file: trainval/annfiles, img_dir: trainval/images }执行裁剪命令python tools/data/dota/split/img_split.py --base_json split_configs/custom.json2.2 数据增强配置针对遥感数据特点推荐在配置文件中添加以下增强策略train_pipeline [ dict(typeLoadImageFromFile), dict(typeLoadAnnotations, with_bboxTrue), dict(typeRResize, img_scale(1024, 1024)), dict(typeRRandomFlip, flip_ratio0.5), dict(typeRandomRotate, rate0.5, angles[30, 60, 90, 120, 150]), dict(typeBrightnessTransform, level5), dict(typeContrastTransform, level5), dict(typeNormalize, mean[123.675, 116.28, 103.53], std[58.395, 57.12, 57.375]), dict(typePad, size_divisor32), dict(typeDefaultFormatBundle), dict(typeCollect, keys[img, gt_bboxes, gt_labels]) ]3. 模型选择与配置优化3.1 主流旋转检测模型对比模型类型优势适用场景训练效率Rotated Faster R-CNN检测精度高稳定性好中等规模数据集中等R3Det对小目标检测效果好密集小目标场景较低S2ANet计算效率高实时检测需求较高KFIoU旋转框回归更准确高精度定位要求较低3.2 关键参数配置在configs/rotated_faster_rcnn目录下的配置文件中需要特别关注以下参数model dict( typeRotatedFasterRCNN, backbonedict( typeResNet, depth50, num_stages4, out_indices(0, 1, 2, 3), frozen_stages1, norm_cfgdict(typeBN, requires_gradTrue), norm_evalTrue, stylepytorch), neckdict( typeFPN, in_channels[256, 512, 1024, 2048], out_channels256, num_outs5), rpn_headdict( typeRotatedRPNHead, in_channels256, feat_channels256, anchor_generatordict( typeAnchorGenerator, scales[8], ratios[0.5, 1.0, 2.0], strides[4, 8, 16, 32, 64]), bbox_coderdict( typeDeltaXYWHABBoxCoder, target_means[0.0, 0.0, 0.0, 0.0, 0.0], target_stds[1.0, 1.0, 1.0, 1.0, 1.0]), loss_clsdict( typeCrossEntropyLoss, use_sigmoidTrue, loss_weight1.0), loss_bboxdict(typeSmoothL1Loss, beta1.0 / 9.0, loss_weight1.0)), roi_headdict( typeRotatedStandardRoIHead, bbox_roi_extractordict( typeSingleRoIExtractor, roi_layerdict(typeRoIAlign, output_size7, sampling_ratio0), out_channels256, featmap_strides[4, 8, 16, 32]), bbox_headdict( typeRotatedShared2FCBBoxHead, in_channels256, fc_out_channels1024, roi_feat_size7, num_classes1, # 修改为实际类别数 bbox_coderdict( typeDeltaXYWHABBoxCoder, target_means[0.0, 0.0, 0.0, 0.0, 0.0], target_stds[0.1, 0.1, 0.2, 0.2, 0.1]), reg_class_agnosticTrue, loss_clsdict( typeCrossEntropyLoss, use_sigmoidFalse, loss_weight1.0), loss_bboxdict(typeSmoothL1Loss, beta1.0, loss_weight1.0))), train_cfgdict( rpndict( assignerdict( typeMaxIoUAssigner, pos_iou_thr0.7, neg_iou_thr0.3, min_pos_iou0.3, match_low_qualityTrue, ignore_iof_thr-1), samplerdict( typeRandomSampler, num256, pos_fraction0.5, neg_pos_ub-1, add_gt_as_proposalsFalse), allowed_border0, pos_weight-1, debugFalse), rpn_proposaldict( nms_pre2000, max_per_img2000, nmsdict(typenms, iou_threshold0.7), min_bbox_size0), rcnndict( assignerdict( typeMaxIoUAssigner, pos_iou_thr0.5, neg_iou_thr0.5, min_pos_iou0.5, match_low_qualityFalse, ignore_iof_thr-1), samplerdict( typeRandomSampler, num512, pos_fraction0.25, neg_pos_ub-1, add_gt_as_proposalsTrue), pos_weight-1, debugFalse)), test_cfgdict( rpndict( nms_pre2000, max_per_img2000, nmsdict(typenms, iou_threshold0.7), min_bbox_size0), rcnndict( nms_pre2000, max_per_img2000, score_thr0.05, nmsdict(typenms, iou_threshold0.1), max_per_img2000)))4. 训练优化与问题排查4.1 内存不足解决方案当遇到CUDA out of memory错误时可尝试以下调整减小batch size修改dotav1.py中的samples_per_gpu优化数据加载调整workers_per_gpu为2-4使用混合精度训练在配置中添加fp16 dict(loss_scale512.)4.2 训练参数调优关键训练参数推荐设置参数名称推荐值范围作用说明base_lr0.001-0.01基础学习率warmup_iters500-1000学习率预热步数optimizer.momentum0.9-0.99优化器动量lr_config.step[8, 11]学习率衰减时机total_epochs12-24总训练轮数4.3 模型评估与测试训练完成后使用以下命令进行测试python tools/test.py \ configs/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py \ work_dirs/rotated_faster_rcnn/latest.pth \ --eval mAP评估指标解读mAP: 平均精度均值主要评估指标AP50: IoU阈值为0.5时的AP值AP75: IoU阈值为0.75时的AP值5. 部署与性能优化5.1 模型导出为ONNX格式from mmdet.apis import init_detector, export_model config_file configs/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py checkpoint_file work_dirs/rotated_faster_rcnn/latest.pth export_model(config_file, checkpoint_file, model.onnx)5.2 推理速度优化技巧模型剪枝移除冗余卷积层量化压缩将FP32转为INT8TensorRT加速转换模型为TensorRT引擎多尺度测试优化合理设置测试尺度test_pipeline [ dict(typeLoadImageFromFile), dict( typeMultiScaleFlipAug, img_scale(1024, 1024), flipFalse, transforms[ dict(typeRResize), dict(typeNormalize), dict(typePad, size_divisor32), dict(typeDefaultFormatBundle), dict(typeCollect, keys[img]) ]) ]在实际项目中我们发现将输入尺寸从1024x1024降至800x800可使推理速度提升约40%而mAP仅下降2-3个百分点这种权衡在实时性要求高的场景中非常实用。