感觉这才是值得做的编程工作啊!吾辈当自勉
news.ycombinator.com/item?id=42465378
Xbox 360 的一位软件工作者也来发了个帖子:
One challenge was that while I started working on the Xbox 360 about three years before it would ship, we knew that the custom CPU would not be available until early 2005 (first chips arrived in early February). And there was only supposed to be one hardware spin before final release.
一项挑战是,我是在 XBox 360 发售三年前开始给它写代码的(这位程序员当时写的是 bootloader 、kernel 和 hypervisor )。我们知道要等到 2005 年年初才能拿到芯片。
So I had no real hardware to test any of the software I was writing, and no other chips (like the Apple G5 we used as alpha kits) had the custom security hardware or boot sequence like the custom chip would have. But I still needed to provide the first stage boot loader which is stored in ROM inside the CPU weeks before first manufacture.
但是我仍然需要在开始生产之前数周的时候就拿出 bootloader 的代码,用以在生产之时保存在 ROM 中。
I ended up writing a simulator of the CPU (instruction level), to make progress on writing the boot code. Obviously my boot code and hypervisor would run perfectly on my simulator since I wrote both!
于是我最后写了一个指令级的 CPU 模拟器,用来帮助我的编程工作。我的 bootloader 和 hypervisor 在模拟器上运行的很完美。
But IBM had also had a hardware accelerated cycle-accurate simulator that I got to use. I was required to boot the entire Xbox 360 kernel in their simulator before I could release the boot ROM. What takes a few seconds on hardware to boot took over 3 hours in simulation. The POST codes would be displayed every so often to let me know that progress was still being made.
IBM 也提供了一个硬件加速的、能够精准到时钟周期的模拟器。我需要在烧录 ROM 之前在这个模拟器上成功运行整个 XBox 360 的 kernel 。在硬件上只需要几秒钟就能开机,但是在 IBM 提供的模拟器上却需要三个小时才能完成整个开机过程。为了显示开机程序还在运行,自检代码每隔一会就会在屏幕上出现。
The first CPU arrived on a Friday, by Saturday the electrical engineers flew to Austin to help get the chip on the motherboard and make sure FSB and other busses were all working. I arrived on Monday evening with a laptop containing the source code to the kernel, on Tuesday I compiled and flashed various versions, working through the typical bring-up issues. By Wednesday afternoon the kernel was running Quake, including sound output and controller input.
第一块 CPU 周五运到。周六硬件工程师飞到 Austin 去帮助把它安装在主板上,并保证总线正常工作。我是周一晚上赶到的,随身带了一台笔记本电脑,上面有 kernel 的代码。到了周二,代码编译好并烧录进机器,到了周三下午就能够跑 Quake 了,包括声音输出和摇杆输入。
Three years of preparation to make my contribution to hardware bring-up as short as possible, since I would bottleneck everyone else in the development team until the CPU booted the kernel.
关于这位大牛自己写的模拟器(指令级别)的一些趣闻:
I called the simulator Sbox and it was just a simple console app. I didn't implement the GPU, so no graphics just the hypervisor and kernel and some simple non-graphics apps. I made it so that you could build the Xbox 360 kernel on your windows machine, then just run sbox.exe and it would automatically find the just built kernel image targeting the PPC64 and boot it. Then if you typed control-C it would drop into the kernel debugger as a sub process, and you could poke around at the machine state as if it were the real Xbox hardware, showing all the PPC instructions and registers. It was a lot of fun writing it, and quite useful.
我把它叫做 Sbox 。它不模拟 GPU ,所以没有图像,只有 hypervisor 和 kernel 以及一些简单的文字程序。开发团队能够在 Windows 上编译 Xbox 360 内核,然后运行 sbox.exe ,它就能自动找到刚刚编译好的针对 Power PC 的内核二进制文件,并在模拟器内部开机。如果你输入 Ctrl-C ,那么就能进入到 kernel debugger 的子进程,勘察(模拟)机器的状态,包括所有 Power PC 指令和计算器,就好像这是一台真正的 XBox 360 一样。
(以下是大牛的一位同事说的)
You should also talk about the lwarx/stecx bug. IIRC - in the first version of the chip there was a bug in one or both of these instructions. Your code booted on SBox but didn't on the hardware. You compared the two and then figured out it was these instructions.
You filed a bug report and then dug into them and used SBox to figure out what must have been going wrong.
你应该谈谈 lwarx/stecx bug 。我没记错的话,第一版的芯片有这个 bug 。你的代码能够在 Sbox 上运行,但是不能在硬件上运行。你在对比了两者之后,分析出是这两个指令的问题。你(给 IBM )递交了一份 bug 报告。
The chip supplier came back with a workaround and within five minutes you simulated it on SBox and said it wouldn't work, why, and then said how it should be fixed.
IBM 送了一块新的芯片过来,你在 SBox 上跑了五分钟之后就说它肯定不行,应该怎么样怎么样才能解决这个问题。(我读下来的感觉是,IBM 仅仅是提供了一个 workaround ,然后大牛在 SBox 里模拟了这个 workaround ,发现还是不行)
The supplier didn't believe you as yet. And you worked out a workaround so we could be unblocked. Two weeks later they agreed with your fix...
IBM 不相信你的话,于是你自己想了个解决办法,避免我们的进度停滞不前。两个礼拜之后 IBM 说你的解决办法没错。
之后有给编译器打补丁的事情,也是和芯片bug有关:
So the PPC instruction set uses lwarx (load word and reserve indexed), and stwcx (store word conditional indexed), along with variations for word size, to implement atomic operations such as interlocked-increment and test-and-set.
PowerPC指令用lwarx,stwcx和不同的字长来实现原子操作。
So on PPC interlocked-increment is implemented as:
loop: lwarx r4,0,r3 # Load and reserve r4 <- (r3) addi r4,r4,1 # Increment the value stwcx. r4,0,r3 # Store the incremented value if still reserved bne- loop # Loop and try again if lost reservation
比如interlocked-increment的程序:
loop:
lwarx r4, 0, r3 # Load and reserve r4 <- (r3)
addi r4, r4, 1 # r4++
stwcx r4, 0, r3 # Store incremented r4 if still reserved
bne loop # try again if lost reservation
The idea is that the lwarx places a reservation on an address that it wants to update at some later time. It doesn't prevent any other thread or processor from reading or writing to that address, or cause any sort of stall, but if an address being reserved is written to, conditional or otherwise, then the reservation is lost. The stwcx instruction will perform the store to memory if the reservation still exists clears the NE flag, otherwise it doesn't do the write and sets the NE flag and software should just try again until it succeeds.
大体上来说,lwarx“保留”了一个将来我们需要更新的地址。它并没有阻止其他线程或者CPU来对该地址进行读和写,或者造成任何的CPU停顿。但是如果一个被“保留”的地址被写入,那么该“保留”就失效了。stwcx指令在保留有效的时候,会将寄存器的内容保存到内存中,并且将NE标志置零,否则就不会写入,并且将NE标志置一。BNE则保证循环一直运行直到写入成功。
On the Xbox 360 we provided the compiler which would emit sequences like these for all atomic intrinsics, but developers could also write assembler code directly if they wanted to. We'll get back to this point in a moment.
XBox 360 的编译器会对所有的原子操作生成类似的指令,但是程序员也可以手写这样的汇编指令——这点我们之后会提到。
As the V1 version of the Xbox 360 CPU was being tested by IBM, they discovered that an error with the hardware implementation of these two instructions and issued an errata for software to work around it, which we implemented. Unfortunately, after further testing IBM discovered that the errata was insufficient, so issued a second errata, which we also implemented and assumed all was well.
IBM 发现第一版的 CPU 有个关于这两个指令的硬件错误,于是他们发布了一条勘误,让我们从软件上绕过这个问题。我们就这么做了。不幸的是,后来他们发现这还不够,于是发布了第二条勘误,我们也照做了。
Then the V2 version of the CPU comes out and months go by. But early one morning I get a phone call from IBM letting me know that the latest errata was still insufficient and that the bug is in the final hardware. Further, Microsoft has already started final production of CPU parts, even before full testing was fully complete (risk buy), so that they could have sufficient supply for the upcoming November release. I was told that they are stopping manufacturing of additional CPUs, and that I had 48 hours to figure out if there is anything software can do that could work around the hardware issue. They also casually mentioned that millions of dollars of parts would need to be discarded, a hardware fixed implemented which would take weeks, then the production could resume from scratch.
第二版 CPU 出来几个月之后,IBM 在某个早晨给我电话,说是上次的勘误还是不够,最终版本的硬件还是有问题。更糟糕的是,微软已经为了十一月的发布生产了大量的硬件。IBM 决定停止生产 CPU ,我只有 48 小时的时间来想办法从软件上绕过这个问题。他们还提到,还有一个办法是,扔掉已经造好的价值数百万美元的硬件,花几个星期的时间从硬件上解决这个问题,然后从头生产。
Bottom line is that, yes, there was a set of software changes that would work around the bug, but it required very specific sequences of instructions, the disabling of interrupts around these sequences, a change to the hypervisor, and updating the compiler to emit the new sequences. To make sure that developers didn't introduce code sequences that uses lwarx/stwcx in a way that would expose the bug (via inline assembly, for example), the loader would scan the code and refuse to load code that didn't obey the new rules.
的确有办法从软件上绕过这个问题,但是比较复杂——首先是需要一些必须顺序精确的指令,其次是需要在执行这些指令前后暂停中断,还得修改 hypervisor 的代码,并且修改编译器来生成这些指令。为了避免游戏程序员用到 lwarx/stwcx 这两个指令,不小心触发 bug (比如说通过内嵌汇编),加载器会扫描并拒绝加载违反这些规则的代码。
Interesting fact: the hardware bug existed in every version of the Xbox 360 ever shipped, because software needed to run on any console ever shipped, there was no advantage to ever fixing the bug since software always needed to work around it anyway.
一件有趣的事情:这个硬件的 bug 在每台 XBox 360 中都存在,因为游戏需要在每个版本的硬件上都能跑,所以修复这个 bug 毫无意义。
真的是牛人,太狠了。。
这类 emu 要涉及到的东西太多了。
是啊,这是生产项目的模拟,不是我那种写个 lc-3 模拟器的玩具。而且 PPC 的芯片在当时也算是比较复杂的芯片。
The XCPU, named Waternoose (later Loki) is a custom triple-core 64-bit PowerPC-based design by IBM.
太厉害了,比 IBM 还熟悉它们的产品
这种得是对搞硬件和模拟器已经很熟悉的人了,没干过的第一次肯定不敢这么搞
这种芯片的 specification 估计有 1000 页,就算不用全部读了估计也得读个一大半。
xenonlibrary.com/wiki/Waternoose
短时间搜了一下没找到它的 specification ,可能是因为是特供 XBox 360 的原因。
#4
我找到这哥们了:Dinarte Morais
领英和 X 上都有账号,但是信息少得可怜。微软的网站上有他的一点信息:
Dinarte is a Distinguished Engineer who has worked on the Xbox team since 2001. He’s been the key architect focused on security, software emulation for backward compatibility of games, and digital rights protection of downloaded content. He functioned as a critical liaison between the hardware engineers and the software development team, and worked closely with 3rd party vendors to help them realize the Microsoft vision of next generation gaming. Dinarte first joined Microsoft in 1992 after working as the lead developer for ON Technology and several years as an independent software consultant. He is an MIT graduate.
92 年就加入微软,看起来一直都是偏底层的,不知道在加入 XBox 团队之前做的是什么。
后面还有他用编译器给硬件打 patch ,这 patch 跑满了整个生命周期。。。。
所以说工程这种东西,就是能跑就行。
#7
多谢,看到之后的细节了。能够打这种补丁,的确是厉害。
#7
我把之后的这块也翻译出来了。可惜论坛给的空间太小,如果还需要增加 append 的话,就没机会了。
其实我觉得这都算是比较传统的程序员了。毕竟那个年代选择做程序员的,多少都是有点爱好在里面的。等互联网兴起以后,大把的人只是为了这份工作赚钱才当程序员,那就没什么兴趣可言了。
有兴趣才能做出自己搞个模拟器出来这种事。不然肯定“你 CPU 都没有,我写个什么 Bootloader ?”,互相扯皮甩锅对其进度去了。
MIT 大神
已经超过三十五了。。。
人家是推进社会进步的高质量人类
我只是提供基因突变所需的材料罢了,日常 emo
哈哈,想起来之前有次和甲方对接,他们没有给我们 API 文档,让我们自己逆向动态库去分析接口逻辑,我说做不了,结果被领导 diss 了,说我缺乏探索精神,这样下去永远提高不了,只能当一辈子调库侠。
当时我挺生气,差点想当场离职。事后也在论坛上发了小作文骂他。
现在来看,和楼主帖子里这位老哥相比,还是我浅薄了。怪不得人家能成为大佬,而我干了这么多年还是个菜鸟码农。
这,能一样嘛。你这个是真 CPU 。
大佬的项目提前 3 年开工,你这项目提前 3 天都是好的了。
还好吧,我读到第一段的时候就猜到肯定要搞模拟器
拿不到硬件,可不就得模拟吗
当年文曲星都有模拟器呢,几乎每个型号都有,就是用软件模拟了整个硬件,然后加载文曲星自己的软件进去
仿佛是 2005 年以前了
干过一样的事情。。
我想问一下逆向如果待铁手镯了, 你领导负责吗?
虽然但是也很无聊,实际上没做过觉得很有滤镜。但是实际上陷到各种指令里面看各种复现不了的 bug ,时间久了也这样了,有时候还经常有挫败感。
被吹捧了而已
其实也没那么难,这个模拟器只用运行 bootloader 和 hypervisor ,大概率是不需要支持浮点指令和 SIMD 指令的,起码能省下三分之二的工作量。
我不是程序员,我只是会使用一些流行框架堆砌代码的蓝领工人,当新的工具出现时,我很快就会被淘汰
这个现在新的系统开发已经是基操了,软件和硬件并行开发,如果运气好,硬件团队有大量现成的 IP 可用,那么可以在初期拿到一个开发板,软件团队直接在开发板上开始工作;如果碰到有大量新的 IP 需要重新开发,比如 SoC 上新的架构,这个时候就要上模拟器了。
我知道的 ATI 和 NVIDIA 在早期开发 GPU 的时候,用的一个叫 IKOS 的大规模 FPGA 阵列来模拟整个 GPU ,这一步主要是用来验证逻辑功能的正确性,模拟出来的 GPU 实际相当于运行在几个 KHz 主频上的真实物理 GPU 。等大规模流片拿到样片之后,才开始验证时序。
当然,现在的开发流程可能更先进了,因为不管是计算机集群还是 FPGA 阵列都强大太多。
curl,log4j 他们更加值得赞美, 数十年的维护, 技术不一定要多牛, 但是他们在基础设施占比很重。
要论技术好, 要不对比一下 qemu 的作者。
技术在我看看来真的没那么重要,重要的是孜孜不倦的投入,他们甚至不求回报,顶多拉个赞助。
卫星要有人放,屎山也要有人填。。。 大家都是牛马,别羡慕对方。
#24
QEMU 的作者不用说了,真的是大牛,没得说。
#23
多谢,真是太有意思了,我找找看有没有相关的资料。我当初看一本叫 Soul of the new machine 的书的时候,就觉得那些硬件工程师的工作很烧脑。
#21
好奇,我不懂细节,不需要浮点指令吗?粗粗搜了一下,The hypervisor uses neither floating point instructions nor floating point types. 可能还真是这样。
#19
是呀,没做过真正的生产工具,只做过玩具,所以觉得很有挑战性。这种东西调试起来肯定欲仙欲死。
#16
这倒是,应该是比较常见的事情,我只是觉得这种 CPU 的模拟器比较难写——其实 IBM 的那个模拟器应该更高级一些,毕竟更精确一些。文曲星应该相对好模拟一些?当然用在生产上的东西永远不是玩具能够比拟的。
#14
这不一样啊,这是不合理的要求,等同于扔给你一台 XBOX 让你给黑了,然后做游戏出来——当然没那么复杂,但是意思是一个。
#12
我都过 40 了。。。
怎么不吹下 Linus ,Linux kernel 谁不用?
#10
哈哈,给新产品开发软件可能就是这样了,没兴趣的确也不会去做这个。我总觉得吧,传统的程序员平均水平比较牛,一个的确是因为人数少,不牛一点也做不下去,一个是因为在 8 、90 年代经常要自己做底层的东西,接触汇编啊 C 这些的,水平也就锻炼上去了。
现在不一样了,工作上需要做底层的少而又少,也没必要抓个普通人来做。所以普通人也没啥锻炼机会。能够自己跟着爱好走,把玩具做成千万人都用的开源产品,那就很厉害了。
#33
这哥们太有名气了,不用吹。。。
不要陷入觉得难才值得做的思维陷阱,那样的话觉大部分人都该当🐖养起来啥事儿不用干,毕竟没什么需求是只有你才能实现的。
#36
多谢,其实就是个人兴趣罢了。工作上叫我做啥就做啥。。。
前人做的框架降低了入门的门槛。但是如果你从刚入门开始一直接触的就是字节码、编译器、指令集,你不见得比他们差。
硬件,软件,平台,框架,应用,大家都是 play 的一环。任意一环缺失了,用户都用得不爽。
可惜现在大部分程序员都只会写 CRUD ,V 站这里大部分也是这样
刷了 pixel experience plus,流畅度、耗电都很满意 说两点不足: 没有小爱同学的 AI 通话功能的,未接电话不知道是干什么的。之前即使不接也可以知道对方意…
概述:本文主要研究的是JAVA的字符串拼接的性能,原文中的测试代码在功能上并不等价,导致concat的测试意义不大。不过原作者在评论栏给了新的concat结果,如果有兴趣的同学…
比如创建一个虚拟环境,在这个环境中安装特定的 radis 、mysql ,这个环境启动的时候,这些生效,关闭的时候,就不生效。 为什么不直接开个 linux 或 windows…