在 Princeton 发布 SWE-Bench 之后,用真实世界代码仓库+可执行测试评测大模型软件工程能力,几乎已成为学术界与工业界的共识。围绕 SWE issue 的评测范式迅速发展,也催生了一系列 SWE 系列 benchmark,在刻画模型 ...
Static code analysis and bug detection are integral to modern software engineering, providing a systematic approach to identify defects and security vulnerabilities without executing the code. By ...
Serious security bugs in key parts of the latest Linux code have been fixed, but some small glitches have been introduced, according to a recent scan. In December, Coverity looked at version 2.6.9 of ...
A new report today from code quality testing startup SonarSource SA is warning that while the latest large language models may be getting better at passing coding benchmarks, at the same time they are ...
知乎 on MSN
Vibe coding如果长期使用,会不会有一天出现无法解决的bug?
假设有一台智能设备,它可以一天自动盖一栋三层小楼,那么我们能否认为它可以30多天盖一座100层的摩天楼呢? 现在就是AI ...
As a graduate student, Steven Weisberg helped to develop a university campus — albeit, a virtual one. Called Virtual Silcton, the software tests spatial navigation skills, teaching people the layout ...
Google unveiled "Jules" on Wednesday, an artificial intelligence coding assistant that can autonomously fix software bugs and prepare code changes while developers sleep, marking a significant ...
Facebook doesn't have the most stellar privacy and security track record, especially given that many of its notable gaffes were avoidable. But with billions of users and a gargantuan platform to ...
Some software developers are now letting artificial intelligence help write their code. They’re finding that AI is just as flawed as humans. Last June, GitHub, a subsidiary of Microsoft that provides ...
当前正在显示可能无法访问的结果。
隐藏无法访问的结果