别再手动扒谱了！教你用Python脚本批量转换MIDI为JSON数据，轻松分析音乐结构

张

张建站

2026/4/27 23:20:34

10分钟阅读

别再手动扒谱了！教你用Python脚本批量转换MIDI为JSON数据，轻松分析音乐结构

音乐数据分析新范式Python自动化解析MIDI文件的技术实践音乐研究者和教育工作者常常面临一个共同挑战——如何高效处理大量MIDI文件并提取有价值的信息。传统的手动扒谱方式不仅耗时耗力也难以应对现代音乐分析所需的规模化数据处理需求。本文将介绍一套基于Python的自动化解决方案帮助您将MIDI文件批量转换为结构化JSON数据为后续的音乐分析、可视化展示和机器学习建模奠定基础。1. 环境配置与核心工具链音乐数据处理需要专业的工具支持。我们选择music21作为核心库它不仅支持MIDI文件的解析还提供了丰富的音乐理论分析功能。以下是推荐的环境配置步骤pip install music21 python-rtmidi注意python-rtmidi是可选的实时MIDI接口支持库如果仅需文件处理可不安装music21的强大之处在于其音乐对象模型。它将音乐元素抽象为可编程对象Note表示单个音符Chord表示和弦Stream作为容器组织音乐元素Duration精确控制音符时值2. MIDI到JSON的批量转换技术批量处理是提高效率的关键。以下代码展示了如何遍历目录下的所有MIDI文件并进行转换import os import json from music21 import converter def batch_midi_to_json(input_dir, output_dir): if not os.path.exists(output_dir): os.makedirs(output_dir) for filename in os.listdir(input_dir): if filename.endswith(.mid) or filename.endswith(.midi): midi_path os.path.join(input_dir, filename) json_path os.path.join(output_dir, f{os.path.splitext(filename)[0]}.json) try: score converter.parse(midi_path) notes_data extract_notes_data(score) with open(json_path, w) as f: json.dump(notes_data, f, indent2) print(f成功转换: {filename}) except Exception as e: print(f处理{filename}时出错: {str(e)})2.1 数据结构设计与优化转换后的JSON数据结构直接影响后续分析的便利性。我们设计了一种兼顾可读性和分析效率的格式{ metadata: { title: 示例乐曲, tempo: 120, time_signature: 4/4 }, tracks: [ { name: 钢琴声部, notes: [ { type: note, pitch: 60, name: C4, duration: 1.0, velocity: 80, offset: 0.0 } ] } ] }这种结构的特点包括分离元数据和实际音符数据保留音乐理论信息如音名记录精确的时间偏移量支持多音轨结构3. 高级音乐特征提取技术简单的音符列表转换只是第一步真正的价值在于从数据中提取有意义的音乐特征。3.1 和弦识别与分析from music21 import harmony def analyze_chords(score): chord_results [] for part in score.parts: chords part.chordify() for c in chords.recurse().getElementsByClass(Chord): chord_symbol harmony.chordSymbolFromChord(c) chord_results.append({ offset: c.offset, duration: c.duration.quarterLength, chord: str(chord_symbol), pitches: [p.midi for p in c.pitches] }) return chord_results3.2 节奏模式统计节奏是音乐的重要特征元素。以下代码统计了乐曲中的节奏型分布def analyze_rhythm_patterns(notes_data): duration_counts {} for note in notes_data: dur note[duration] duration_counts[dur] duration_counts.get(dur, 0) 1 # 将统计结果转换为按出现频率排序的列表 sorted_durations sorted(duration_counts.items(), keylambda x: x[1], reverseTrue) return sorted_durations4. 数据分析与可视化应用有了结构化的音乐数据我们可以进行各种有趣的分析和可视化。4.1 音高分布热力图使用matplotlib可以直观展示乐曲的音高使用情况import matplotlib.pyplot as plt import numpy as np def plot_pitch_heatmap(notes_data): pitches [n[pitch] for n in notes_data if n[type] note] pitch_range (min(pitches), max(pitches)) plt.figure(figsize(12, 6)) plt.hist(pitches, binsnp.arange(pitch_range[0], pitch_range[1]1)-0.5, edgecolorblack) plt.title(音高分布直方图) plt.xlabel(MIDI音高) plt.ylabel(出现次数) plt.xticks(np.arange(pitch_range[0], pitch_range[1]1, 5)) plt.grid(axisy, alpha0.5) plt.show()4.2 音乐结构分析通过音符密度分析可以识别乐曲的段落结构def analyze_section_structure(notes_data, window_size4.0): offsets [n[offset] for n in notes_data] max_offset max(offsets) # 创建时间窗口 windows np.arange(0, max_offset, window_size) note_counts [] for i in range(len(windows)-1): start windows[i] end windows[i1] count sum(1 for o in offsets if start o end) note_counts.append(count) # 绘制音符密度曲线 plt.figure(figsize(12, 4)) plt.plot(windows[:-1], note_counts) plt.title(音符密度分析窗口大小: {}拍.format(window_size)) plt.xlabel(时间拍) plt.ylabel(音符数量) plt.grid() plt.show()5. 性能优化与大规模处理当处理大量MIDI文件时性能成为关键考虑因素。以下是几个优化技巧5.1 并行处理实现利用Python的multiprocessing模块加速批量转换from multiprocessing import Pool def process_single_file(args): midi_path, output_dir args try: score converter.parse(midi_path) notes_data extract_notes_data(score) output_path os.path.join(output_dir, f{os.path.splitext(os.path.basename(midi_path))[0]}.json) with open(output_path, w) as f: json.dump(notes_data, f) return True except Exception as e: print(f处理{midi_path}时出错: {str(e)}) return False def parallel_convert_midi_to_json(file_list, output_dir, workers4): if not os.path.exists(output_dir): os.makedirs(output_dir) task_args [(f, output_dir) for f in file_list] with Pool(workers) as p: results p.map(process_single_file, task_args) success_rate sum(results) / len(results) print(f处理完成成功率: {success_rate:.1%})5.2 内存优化策略处理大型MIDI文件时可以采用流式处理方式减少内存占用def stream_process_midi(midi_path): score converter.parse(midi_path) for part in score.parts: for measure in part.getElementsByClass(Measure): for note in measure.notesAndRests: # 逐音符处理避免一次性加载全部数据 process_note(note)6. 机器学习应用接口设计结构化的音乐数据为机器学习模型提供了理想的输入格式。以下是一个简单的特征提取函数适用于音乐生成或分类任务def extract_features_for_ml(notes_data, window_size16): features [] current_window [] for note in notes_data: current_window.append({ pitch: note[pitch], duration: note[duration], offset: note[offset] % 4 # 小节内位置 }) if len(current_window) window_size: # 提取窗口特征 window_features { pitch_mean: np.mean([n[pitch] for n in current_window]), duration_var: np.var([n[duration] for n in current_window]), note_density: len(current_window)/window_size } features.append(window_features) current_window [] return features在实际项目中这套自动化处理流程将音乐分析效率提升了数十倍特别是在处理数百首乐曲的音乐风格比较研究时批量处理和数据一致性带来的优势尤为明显。

教育MOOC革命：软件测试从业者的机遇、挑战与未来

一场席卷而来的数字化学习浪潮在信息技术重塑各行各业的当下，教育领域正经历一场由大规模开放在线课程引领的深刻变革。这股“数字海啸”不仅打破了传统高等教育的时空围墙，更以开放、普惠、高质量的形态，成为终身学习时代的重要引擎。对于专…...

2026/4/27 23:20:29 阅读更多 →

手把手带你用Python从零实现SIMON加密算法（附可运行代码）

用Python从零构建SIMON轻量级加密算法：原理剖析与实战实现在当今数据安全日益重要的时代，理解加密算法的底层原理已成为开发者的一项核心技能。SIMON作为美国国家安全局(NSA)设计的轻量级分组密码，以其简洁优雅的结构和高效的硬件实现特性&a…...

2026/4/27 23:19:24 阅读更多 →

深度解析：evernote-backup 技术实现与最佳实践

深度解析：evernote-backup 技术实现与最佳实践【免费下载链接】evernote-backup Backup & export all Evernote notes and notebooks 项目地址: https://gitcode.com/gh_mirrors/ev/evernote-backup 在数字时代，数据备份已成为知识工作者的刚…...

2026/4/27 23:16:51 阅读更多 →

AI智能体工作流编排：从单体架构到流水线协作的工程实践

1. 项目概述：当AI智能体学会“流水线”协作最近在探索AI智能体（Agent）的落地应用时，我遇到了一个非常有意思的项目：coleam00/ottomator-agents。这个名字本身就充满了想象力——“Ottomator”，听起来像是“…...

2026/4/26 0:13:33 阅读更多 →

ChatGPT翻译能力解析与实战技巧

1. ChatGPT翻译能力深度解析作为一名长期从事语言技术研究的从业者，我最近系统测试了ChatGPT在多语言翻译场景下的实际表现。与传统的机器翻译工具相比，ChatGPT展现出几个独特优势：首先，它的上下文理解能力远超传统翻译引擎。当处…...

2026/4/26 0:13:35 阅读更多 →

2026届毕业生推荐的十大降AI率助手实际效果

Ai论文网站排名（开题报告、文献综述、降aigc率、降重综合对比） TOP1. 千笔AI TOP2. aipasspaper TOP3. 清北论文 TOP4. 豆包 TOP5. kimi TOP6. deepseek 在内容生产这个过程当中，要降低AIGC也就是人工智能生成内容所占的比例&#xff…...

2026/4/26 0:15:22 阅读更多 →