测试

50/50 个测试站点,50 个已有复测分数。

模型总榜

playwright-recheck-graphweighted+visible-graph-audit

#徽记模型均分区间Loading/15Graph/35Articles/15Visual/20Interact/15
>>>
GLM 5.2
GLM
85.572.8-93.81522.31518.414.8
>>
Qwen3.7 Max
Qwen
82.466.2-93.814.427.714.513.412.4
>
Kimi K2.7 Code
Kimi
80.351.6-93.814.622.314.515.413.6
4
MiniMax M3
MiniMax
77.452.6-95.814.416.714.018.813.6
5
DeepSeek V4 Pro
DeepSeek
67.139.2-9614.616.414.59.811.9

Round 矩阵