CANN/cannbot-skills: 模拟器与OpExec事实指南
Simulator and OpExec Facts【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skillsUse this file for simulator behavior one-liners andOpExeccall-site gotchas. Useagent/references/code-paths.mdandagent/references/simulator-v2.mdwhen you need the full implementation path.Simulator-behavior one-linersOpExec(..., simulatorTrue | v2 | legacy)all route to V2simulatorlegacydoes not select a separate old runtimebar_all()is the only cross-pipe drain in V2;bar_v()/bar_m()are no-opsfor kernel cycle optimization, the target is the trace makespan (the last instruction/task to finish, i.e.max(ts dur)over timed trace events), not the sum of all activated-task durationswait_vec/wait_cubetimeout almost always means the other lanes actor thread crashed silentlydo not run multiple V2 simulator processes concurrently; thread contention can cause silent data corruptionPyTorch does not support indexing onfloat8_e5m2/float8_e4m3fn; view astorch.uint8before indexing insidevfburst-copy ops (gm_to_ub_pad,ub_to_gm_pad,ub_to_l1_nz) are safe on column-sliced UB views because they use_linear_view_from_pointerOpExeccall-site checklistprovideshape_bindings{...}when two or more kernel-side scalar dimensions can take the same integer value at runtimeshape_bindingsbelongs on the returned callable, not on theOpExec(...)constructorformat:{tensor_arg_index: [scalar_idx_for_axis_0, scalar_idx_for_axis_1, ...], ...}the key is indexed among tensor args only; scalar args are skippeduseNoneto keep an axis unboundExample, forkernel(x:[M,K], y:[N,K], z:[M,N], M, N, K):shape_bindings{0: [0, 2], 1: [1, 2], 2: [0, 1]}Implementation:easyasc/torchplugin.py:614-668Real references:agent/example/kernels/a5/matmul_rowwise_norm_large_nk.py:137agent/example/kernels/a5/vec_cube_vec_scale2_abs_add1_matmul.py:116Deeper referencesagent/references/simulator-v2.mdagent/references/code-paths.mdagent/playbooks/kernel-debugging.md【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考