|
arnezami
|
 |
« on: July 22, 2007, 07:59:00 AM » |
|
I still have this idea I would like to share. The problem:We currently do not have a way to downgrade without a cpu key (especialy when fuses are blown). From what I understand there are two main problems: - The CB is signed/authenticated with the cpu key using an HMAC algo. Unless you have a dump of the CB before upgrading you won't be able to create this "signature" without the cpu key.
- In the decrypted CF there is a "version lockdown counter". This part of the CF is also signed/authenticated by the cpu key. Without it you cannot change the blown fuses counter.
I believe there may be a way to (depending on several conditions) to downgrade without a cpu key. The idea:It would be foolish to try to break SHA1-HMAC. However the output of a hash usually has to be checked against something that is stored. Thats usually the point of it. This takes (a tiny bit of) time. The thing is many memcmp functions use a byte-wise compare: "as long as no difference in the current byte is detected go to the next byte, but if this byte is different stop". In other words: it might take (a fraction of a second) longer if the output is similar at the beginning (to the stored value) as opposed to completely different 16-byte values. If it is possible to measure this time difference you could change the first stored byte (up to 256 times) until it takes this fraction longer for the xbox to detect the (16 byte) values are not entirely the same. And you can go on with this until all bytes have been figured out this way. It this technique would work you could do the following: - Encrypt and flash the NAND with an exploitable kernel (CB/CD/CF etc) using the 1BL key. The 16 byte auth-values (in both the CB and CF) will of course not be correct.
- Add hardware that can measure time/clocks from a certain point in the boot sequence until a detectable moment after the (failed) verification of the CB auth value.
- Make sure you change the first stored byte of this CB auth value and reboot. Go on until it takes a tiny bit more time for the xbox to detect the CB auth value is incorrect. This means you've found the first byte.
- Go on until all 16 bytes are found.
- After this do essentially the same with the auth-value in the CF (using a different trigger point of course).
- You should now be able to boot into a vurnable kernel and extract your fuses.
Assuming a re-boot time of aprox 2-3 seconds this would take roughly 2-4 hours. The conditions:Needless to say there are a lot of things have to be just right for this to work at all. Here are the conditions for this to (remotely) possible: - In both cases (CB auth and CF auth) the value stored has to be directly compared with another value (eg the HMAC output). There should not be some kind of (crypto) operation on these value before comparision.
- The comparison should be byte wise.
- The difference between the amount of the identical starting bytes should be measurable. Possibly using a triggger point from where to count/time it.
Just to be clear: I simply don't know if the conditions can be met. The coding stuff can best be checked by those who have easy access to RE'd code. The (hardware) timing will be much more difficult. So I guess it should first be checked how the checking/comparison algos are implemented before thinking about the possible hardware challenges. Just my 2 cents  . Regards, arnezami PS. Here is a compare function used in the 1BL as an example: ROM:000056F0 # ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ ROM:000056F0 ROM:000056F0 ROM:000056F0 sub_56F0: # CODE XREF: sub_4658+A4p ROM:000056F0 cmpwi %r5, 0 ROM:000056F4 mtctr %r5 ROM:000056F8 beq loc_5720 ROM:000056FC lbz %r6, 0(%r4) ROM:00005700 lbz %r5, 0(%r3) ROM:00005704 b loc_5710 ROM:00005708 # --------------------------------------------------------------------------- ROM:00005708 ROM:00005708 loc_5708: # CODE XREF: sub_56F0+24j ROM:00005708 lbzu %r6, 1(%r4) ROM:0000570C lbzu %r5, 1(%r3) ROM:00005710 ROM:00005710 loc_5710: # CODE XREF: sub_56F0+14j ROM:00005710 cmpw %r5, %r6 ROM:00005714 bdnzt eq, loc_5708 ROM:00005718 subf %r3, %r6, %r5 ROM:0000571C blr ROM:00005720 # --------------------------------------------------------------------------- ROM:00005720 ROM:00005720 loc_5720: # CODE XREF: sub_56F0+8j ROM:00005720 li %r3, 0 ROM:00005724 blr ROM:00005724 # End of function sub_56F0
|
|
|
|
« Last Edit: August 14, 2009, 02:03:53 AM by arnezami »
|
Logged
|
|
|
|
|
xordef
|
 |
« Reply #1 on: July 23, 2007, 06:05:17 AM » |
|
Lots of security systems have been broken using this attack vector - it's pretty well known. I'd be surprised if Microsoft haven't closed it down, but it's well worth looking into if no one has already done it.
|
|
|
|
|
Logged
|
|
|
|
|
arnezami
|
 |
« Reply #2 on: July 24, 2007, 04:45:50 AM » |
|
Lots of security systems have been broken using this attack vector - it's pretty well known. I'd be surprised if Microsoft haven't closed it down, but it's well worth looking into if no one has already done it.
Yes. Its indeed worth taking a look at. Especially since it would be a permanent solution: no matter what they patch in new kernels you would always be able to downgrade to 4532 or 4548. As opposed to new and increasingly harder to find code exploits (for newer kernels) that can and will always be revoked. Main problems I foresee: if the compare detects a wrong auth-value (with an n-byte difference) somehow n has to be detectable and measurable on the 'outside' (=hardware): if it goes into an infinite loop it might not be noticable, but if it goes into some other mode (like starting mfgbootlauncher.xex or starting to output an error message) it might be. And of course the granuality of timing/counting the time difference (good and reliable points to measure between and fast enough chips to measure it) is a pain. But all of that is moot if there isn't a byte-wise compare to begin with. Looking at the assembly but could use some help/hints here  . Regards, arnezami
|
|
|
|
« Last Edit: July 24, 2007, 05:15:32 AM by arnezami »
|
Logged
|
|
|
|
robinsod
Global Moderator
Xbox Hacker
    
Posts: 646
Perl packed my shorts during global destruction
|
 |
« Reply #3 on: July 24, 2007, 09:59:20 AM » |
|
In order to downgrade we need to match the lock down counter in the CF section to the fuses in the CPU, the area of the CF that holds this counter is hashed with the CPU key. The hash is (I believe) compared on a byte by byte basis as you illustrated. The obvious place to look is the POST port, TMF posted details in another thread. It is an 8 bit wide port (0-1V levels) that the processor writes various debug/status codes to as it boots, these include error codes if a check fails. The good news is that the CF hash check is also instrumented by these POST codes. I have build hardware to monitor the POST port and read the values, its not dificult to build a level shifter using LM339 comparators. The rest could be implemented using CPLD/FPGA and a small micro to control the whole thing. The dificulty I see is that the variation in time between a fail and a pass for 1 byte is going to be so small that it simply gets lost in the background noise, maybe we could detect it by averaging over 10 attempts? The 360 seems to retry 3 times if it detects a failure before hanging permanently. Another drawback is that every CPU key is different - this prevents the so called "hero attack" and it's proving quite effective  If this approach worked you would need to repeat it for every box. Since the timing measurements have to be very precise its likely such a devive would be expensive. But the same hash checking code is used to verify that CB has not been modified. Since this is executed fairly early on there is a reasonable chance the processor is not fully initialised and may not be running at full speed... I'm also fairly sure the POST port is used as before. The other benefit is that the same 1BL key is used in all boxes. So could we modify the CB section and brute force the correct hash as you suggest? If so then one patched CB would/should run on all boxes and we gain control early in the boot process - potentially an even more powerful hack than the KK shader. There are at least 4 different versions of CB in the wild but I've never seen the CB updated in any of the dumps I have. I think the answer depends on how repeatable the timings are and how accurately you can measure it. The brute force attack is then reduced to 16 * 256 * Number Of Samples, a much more manageable number
|
|
|
|
|
Logged
|
|
|
|
|
arnezami
|
 |
« Reply #4 on: July 24, 2007, 10:02:42 AM » |
|
No, again: The key in the bootloader is the very same for each and every box.
Different in each box are the fuses. They are not used for decrypting, but they are verified during CB (Post code 0x21, on verification failure: 0x9B..0xA4). The data which is used to verify the fuses is CB:0x20..CB:0x3F. This (and nothing else) is what prevents us to swap flashroms. The CB section will always be decoded correctly. You can verify that by watching the POST codes with no flash, or broken CB, or incorrect fuse data. I've been doing some RE-ing concerning the HMAC checking in the CB. If I'm correct this is the compare function used: ROM:000092D0 # ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ ROM:000092D0 ROM:000092D0 ROM:000092D0 sub_92D0: # CODE XREF: ROM:00007F14p ROM:000092D0 cmpwi %r5, 0 ROM:000092D4 mtctr %r5 ROM:000092D8 beq loc_9300 ROM:000092DC lbz %r6, 0(%r4) ROM:000092E0 lbz %r5, 0(%r3) ROM:000092E4 b loc_92F0 ROM:000092E8 # --------------------------------------------------------------------------- ROM:000092E8 ROM:000092E8 loc_92E8: # CODE XREF: sub_92D0+24j ROM:000092E8 lbzu %r6, 1(%r4) ROM:000092EC lbzu %r5, 1(%r3) ROM:000092F0 ROM:000092F0 loc_92F0: # CODE XREF: sub_92D0+14j ROM:000092F0 cmpw %r5, %r6 ROM:000092F4 bdnzt eq, loc_92E8 ROM:000092F8 subf %r3, %r6, %r5 ROM:000092FC blr ROM:00009300 # --------------------------------------------------------------------------- ROM:00009300 ROM:00009300 loc_9300: # CODE XREF: sub_92D0+8j ROM:00009300 li %r3, 0 ROM:00009304 blr ROM:00009304 # End of function sub_92D0
As you can see it is indeed a byte-wise comparison  . [edit] @robinsod: we cross posted. Will react later...
|
|
|
|
|
Logged
|
|
|
|
|
arnezami
|
 |
« Reply #5 on: July 24, 2007, 10:09:26 AM » |
|
This is the HMAC check (and the call to the byte-wise compare function) ROM:00007ED8 Check_HMAC: # CODE XREF: sub_6B78+398p ROM:00007ED8 mflr %r12 ROM:00007EDC std %r12, -8(%sp) ROM:00007EE0 std %r31, -0x10(%sp) ROM:00007EE4 stdu %sp, -0x90(%sp) ROM:00007EE8 li %r11, 0x14 ROM:00007EEC addi %r31, %sp, 0x60 ROM:00007EF0 stw %r11, 0x5C(%sp) ROM:00007EF4 std %r31, 0x50(%sp) ROM:00007EF8 bl HMAC_SHA1 ROM:00007EFC lwz %r11, 0xEC(%sp) ROM:00007F00 cmplwi cr6, %r11, 0x14 ROM:00007F04 bgt cr6, loc_7F24 ROM:00007F08 rldicl %r5, %r11, 0,32 ROM:00007F0C ld %r4, 0xE0(%sp) ROM:00007F10 addi %r3, %sp, 0x60 ROM:00007F14 bl sub_92D0 <----------- ! ROM:00007F18 cmpwi cr6, %r3, 0 ROM:00007F1C li %r3, 1 ROM:00007F20 beq cr6, loc_7F28 ROM:00007F24 ROM:00007F24 loc_7F24: # CODE XREF: ROM:00007F04j ROM:00007F24 li %r3, 0 ROM:00007F28 ROM:00007F28 loc_7F28: # CODE XREF: ROM:00007F20j ROM:00007F28 addi %sp, %sp, 0x90 ROM:00007F2C ld %r12, -8(%sp) ROM:00007F30 mtlr %r12 ROM:00007F34 ld %r31, -0x10(%sp) ROM:00007F38 blr And here is the moment the HMAC function is called and the POST error is outputted: ROM:00006EBC loc_6EBC: # CODE XREF: sub_6B78+330j ROM:00006EBC addi %r11, %sp, arg_70 ROM:00006EC0 ld %r10, 0x258(%r30) ROM:00006EC4 rldicl %r9, %r31, 0,32 ROM:00006EC8 srwi %r5, %r29, 2 ROM:00006ECC add %r4, %r9, %r10 ROM:00006ED0 addi %r3, %sp, arg_70 ROM:00006ED4 std %r20, 0(%r11) ROM:00006ED8 std %r20, 8(%r11) ROM:00006EDC bl sub_7FB8 ROM:00006EE0 li %r7, 0x10 ROM:00006EE4 addi %r11, %sp, arg_90 ROM:00006EE8 li %r10, 0x10 ROM:00006EEC addi %r9, %sp, arg_70 ROM:00006EF0 li %r8, 0x10 ROM:00006EF4 stw %r7, arg_5C(%sp) ROM:00006EF8 li %r6, 0x10 ROM:00006EFC addi %r7, %sp, arg_80 ROM:00006F00 std %r11, arg_50(%sp) ROM:00006F04 addi %r5, %r30, 0x10 ROM:00006F08 li %r4, 0x10 ROM:00006F0C addi %r3, %sp, arg_60 ROM:00006F10 bl Check_HMAC ROM:00006F14 cmpwi cr6, %r3, 0 ROM:00006F18 bne cr6, loc_6F2C ROM:00006F1C li %r4, 0xA4 # 'ñ' # Error code 0xA4 <----------- ! ROM:00006F20 ROM:00006F20 loc_6F20: # CODE XREF: sub_6B78+2ECj ROM:00006F20 mr %r3, %r21 ROM:00006F24 bl sub_74D0 # Output POST ERROR <----------- ! ROM:00006F28 bl sub_74C0 # Infinite LOOP <----------- ! ROM:00006F2C ROM:00006F2C loc_6F2C: # CODE XREF: sub_6B78+2D8j ROM:00006F2C # sub_6B78+3A0j ROM:00006F2C lhz %r11, 6(%r30) ROM:00006F30 clrlwi %r10, %r22, 16 ROM:00006F34 or %r11, %r11, %r10 ROM:00006F38 sth %r11, 6(%r30) ROM:00006F3C addi %sp, %sp, 0x170 ROM:00006F40 b loc_8D48 ROM:00006F40 # End of function sub_6B78
I wonder how precise (time/clock wise) the output to the POST bus really is...
|
|
|
|
« Last Edit: July 24, 2007, 10:39:56 AM by arnezami »
|
Logged
|
|
|
|
|
arnezami
|
 |
« Reply #6 on: July 24, 2007, 11:00:26 AM » |
|
To cut to the chase: We may already be capable of doing a basic PoC... Here is what happens during startup to the POST bus:  The checking of the HMAC in the CB happens just after POST code 0x21 (00100001b) has been outputted. When an error in the HMAC check is detected POST code 0xA4 (10100100b) is outputted. If somebody has a logic analyzer he could measure the time difference (or maybe count some external clock pulse?) between the two events. In order for this to work several things would have to be done: - Start with a working (exploitable) NAND.
- Change the first byte of CB auth** (this makes sure the compare function immediatly sees that something is wrong)
- Encrypt CB and flash it
- Startup the xbox several times and measure the time (or clocks?) between 0x21 and 0xA4 (hopefully these are all very close)
- Change the first byte of the CB auth** back to what it is supposed to be and now change the last byte of CB auth** (this makes sure the compare function only sees that something is wrong at the last byte so this takes 15 times more checking of 1 byte)
- Encrypt CB and flash it
- Startup the xbox several times and measure the time (or clocks?) between 0x21 and 0xA4.
- Is there a noticable difference between the two? If yes, then we're in bussiness
. If not we need to think about measuring by clocks or finding a better starting point or maybe slowing down CPU etc...
[edit] I think there could be three outcomes after such a PoC: 1) We see that the time is different even when nothing has been changed. We probably need a better starting point when this happens (eg after the checks normally resulting in errors 0x9B through 0xA3). 2) We see no time measurable difference at all: we need better (or simpler, less bulky) equipment or use counters instead of timers. 3) We can measure the difference in time. We can continue. Btw: as far as I understand DrMatrix the CB auth is also CPU specific... Regards, arnezami PS. Using the rising edge of bit 5 (= green) which corresponds to 0x20 and using the rising edge of bit 7 (= purple) which corresponds to 0xA4 might also work and is maybe easier. ** I do not yet understand why DrMatrix talks about 32 bytes (CB:0x20...0x3F) while it looks like only the last 16 bytes seem to contain a hash (CB:0x30...0x3F). Anyone know this??
|
|
|
|
« Last Edit: July 24, 2007, 11:46:25 AM by arnezami »
|
Logged
|
|
|
|
|
vax11780
|
 |
« Reply #7 on: July 24, 2007, 01:40:08 PM » |
|
Has anyone tried to slow the CPU clock down by tweaking the Cypress(?) clock generator? This would make it easier to detect the subtle timing differences.
VAX
|
|
|
|
|
Logged
|
Join my Folding@Home team! Download software from folding.stanford.edu, and join team 13356. PS3's welcome!
|
|
|
|
tmbinc
|
 |
« Reply #8 on: July 24, 2007, 04:12:44 PM » |
|
I've tried to clock down the CPU by creating an own clock. However, I believe this will fails as soon as the FSB is initialized. The CPU alone can be underclocked.
Actually, I think this is a promising thing.
|
|
|
|
|
Logged
|
Please don't copy/quote full text outside this board. Instead, summarize and link to this post. Thanks! This lets me keep information updated and doesn't pull things out of context.
|
|
|
|
arnezami
|
 |
« Reply #9 on: July 25, 2007, 02:48:43 AM » |
|
In order to downgrade we need to match the lock down counter in the CF section to the fuses in the CPU, the area of the CF that holds this counter is hashed with the CPU key. The hash is (I believe) compared on a byte by byte basis as you illustrated.
The obvious place to look is the POST port, TMF posted details in another thread. It is an 8 bit wide port (0-1V levels) that the processor writes various debug/status codes to as it boots, these include error codes if a check fails. The good news is that the CF hash check is also instrumented by these POST codes.
I have build hardware to monitor the POST port and read the values, its not dificult to build a level shifter using LM339 comparators. The rest could be implemented using CPLD/FPGA and a small micro to control the whole thing.
Yes. The POST port is probably the most conveniant (and maybe only) way of measuring the tiny time difference (at least at the end since nothing else is done but change the POST bus after the difference is detected by CB). We may need a better starting point (like an address bus line) since there is still a lot of stuff going on between post 0x21 and error 0xA4. Its good to hear CF is probably also byte-wise checked.  The dificulty I see is that the variation in time between a fail and a pass for 1 byte is going to be so small that it simply gets lost in the background noise, maybe we could detect it by averaging over 10 attempts? The 360 seems to retry 3 times if it detects a failure before hanging permanently.
Yes. This may be the biggest challenge. We may need some special chips for this (I believe the FSB runs at 5.4 Ghz, CPU at 3.2 and other busses at 1.35 ghz/675 mhz ( link). Are there cheap counters that fast? or maybe first a few freq dividers?). Timing is going to be crucial but it may take some time before we find a way of getting it right. I think we are roughly talking about 10 CPU extra clocks per checked byte so we're talking ns here... (or maybe 1 clock pulse at 400 MHz if we divide the cpu clock by 8...). Trying several times to get an average will also help in that case. Slowing down the cpu would also greatly help. I didn't know about the 3 tries. Is this from sniffing the NAND access? Or from RE-ing code? Do you know when exactly this happens? I haven't seen this in the code from CB so I guess the CF checks may be somewhat different. If slow memory is accessed during our time measurement this could harm our precision. Another drawback is that every CPU key is different - this prevents the so called "hero attack" and it's proving quite effective  If this approach worked you would need to repeat it for every box. Since the timing measurements have to be very precise its likely such a devive would be expensive. Yes. This is unavoidable though. By authenticating with the CPU key MS made sure we have to open each box individually. But we may find ways (possibly by lowering speeds) to make it easier and less expensive. But the same hash checking code is used to verify that CB has not been modified. Since this is executed fairly early on there is a reasonable chance the processor is not fully initialised and may not be running at full speed... I'm also fairly sure the POST port is used as before. The other benefit is that the same 1BL key is used in all boxes.
So could we modify the CB section and brute force the correct hash as you suggest? If so then one patched CB would/should run on all boxes and we gain control early in the boot process - potentially an even more powerful hack than the KK shader. There are at least 4 different versions of CB in the wild but I've never seen the CB updated in any of the dumps I have. I think the answer depends on how repeatable the timings are and how accurately you can measure it. The brute force attack is then reduced to 16 * 256 * Number Of Samples, a much more manageable number
No sadly this is not possible. With RSA (which is what the CB is signed with and prevents it from being altered) the values that are compared are both the results of functions: one is the calculated SHA hash, the other is the same hash calculated from the RSA signature+public key. In other words: we do not have byte-wise control over what goes into the memcmp function because changing one byte in the signature will change many bytes in the compared bytes. Since we cannot change the contents of the CB itself (and thus cannot run unsigned code from boot) there are only a few options left: (1) Find a new exploit in a new kernel. The nice thing is this might work for all (current) xbox-es. A major problem is that it is very very hard to find and that it is going to be revoked. People updating their xbox-es or buying new ones will not benefit but those that already have one will. (2) Downgrade using the above techniques. It seems the main problems are technical and its costs (financially). Would have to be done box-by-box but would work on all box-es regardless of their kernel version. If it can be made easy to do and inexpensive this would be a really good solution. It also prevents us from releasing newly found kernel exploits we don't want to waste. (3) Find an exploit in the 1BL: this is the holy grail. If there is something exploitable in the 1BL itself that would be fantastic because it is non-revokable. Extremely unlikely though. Keep in mind for this timing attack to work we "only" need good starting and end points and a simple but precise counter/timer (eg an 4-8 bit counter might already reveal the differences we want to detect). Regards, arnezami
|
|
|
|
« Last Edit: July 25, 2007, 03:46:13 AM by arnezami »
|
Logged
|
|
|
|
|
arnezami
|
 |
« Reply #10 on: July 25, 2007, 03:14:37 AM » |
|
Has anyone tried to slow the CPU clock down by tweaking the Cypress(?) clock generator? This would make it easier to detect the subtle timing differences.
VAX
I've tried to clock down the CPU by creating an own clock. However, I believe this will fails as soon as the FSB is initialized. The CPU alone can be underclocked.
Actually, I think this is a promising thing.
Interesting. How exactly does the lowering of the clock work? Did you get any POST errors after lowering it? How low can you get it without errors at all? This may be a strange question but: is it possible to lower the clock speed on-the-fly? Possibly triggered by a POST code? Maybe just before the HMAC starts... Sounds crazy, but is it possible? [edit] Something else. Just read this: Additionally, the CPU provides an external debug bus for extended traces; this runs at 1/4 full speed for the CPU, but lets the FSB run at full speed. Here: http://www-128.ibm.com/developerworks/power/library/pa-fpfxbox/?ca=dgr-lnxw09XBoxDesignIs this the JTAG interface they are talking about? Do we know if it is used? Anyway 1/4 of full cpu speed is very precise. So if we could use it somehow... maybe the POST bus also runs at this speed? Regards, arnezami
|
|
|
|
« Last Edit: July 25, 2007, 04:10:59 AM by arnezami »
|
Logged
|
|
|
|
|
|
|
TheSpecialist
|
 |
« Reply #12 on: July 26, 2007, 08:52:49 AM » |
|
I like the idea ! But even if it works, I think you're a bit optimistic here: Assuming a re-boot time of aprox 2-3 seconds this would take roughly 2-4 hours. You also have to write the new flash value, reboot and, like Robinsod noted, you probably have to measure a few times to get to an average value and the xbox will retry 3 times anyway. I think you'd be looking at more than 1 day for 1 xbox, maybe even a few days ...
|
|
|
|
« Last Edit: July 26, 2007, 09:16:55 AM by TheSpecialist »
|
Logged
|
|
|
|
|
arnezami
|
 |
« Reply #13 on: July 26, 2007, 09:45:20 AM » |
|
I like the idea ! But even if it works, I think you're a bit optimistic here: Assuming a re-boot time of aprox 2-3 seconds this would take roughly 2-4 hours. You also have to write the new flash value, reboot and, like Robinsod noted, you probably have to measure a few times to get to an average value and the xbox will retry 3 times anyway. I think you'd be looking at more than 1 day for 1 xbox, maybe even a few days ... Yes. That was my most optimistic guesstimate. But I think many people would be happy if it could be done within say a week. Quite importantly its a one-time-only event and (from a cost perspective) you can even share the equipment/hardware which could reduce the cost for single users (think of schools/universities/clubs etc). But even if it is possible there are still many hurdles to overcome. A step-by-step approach is needed I think to rule out any "deal breakers". Will think about what can best be done right now and what next etc. We need to be sure of a few things first. But so far it looks like it just might be possible...  Regards, arnezami
|
|
|
|
« Last Edit: July 26, 2007, 10:05:55 AM by arnezami »
|
Logged
|
|
|
|
|
SeventhSon
|
 |
« Reply #14 on: July 26, 2007, 11:25:38 AM » |
|
Sounds like fun, but also a lot of difficult to reproduce work just for a downgrade (hackers with no access to exploitable boxes might have more incentive though). All the same, I reckon this thread is tech section worthy.
|
|
|
|
|
Logged
|
|
|
|
mrblack1134
Newbie

Posts: 8
|
 |
« Reply #15 on: July 26, 2007, 03:21:45 PM » |
|
Been reading this thread, really interesting but just thought I'd throw my 2 cents here. While this is a very great idea (and well worth exploring), I'm just wondering if in fact timing just won't get lost in background noise. For example, most systems wait for PLLs to stabilize (lock) before starting execution; stabilization time in turn will vary depending on crystal temperature for example (ie, the first few boot ups might be nanoseconds longer/faster). The CPU might also wait before other devices (ram? dvd? gpu?) pull-up a line to say they're all-go before doing anything. Also this assumes that the number of instructions executed is *exactly* the same up until that memcmp; that sounds like a reasonable assumption, unless the box manages boot-up counters and per-boot information, but still you'd probably get the same amount of code executed anyways I guess. Another thing which could introduce jitter would be possible timer/external interrupts firing up while the CPU has code in its instruction pipeline (which it probably needs to flush first) and could thus introduce random minute delays from boot ups to boot ups. I don't want to bring any show stopper here; just thought I'd throw in a few ideas in case those who try the experiment first don't get the expected results 
|
|
|
|
|
Logged
|
|
|
|
|
SeventhSon
|
 |
« Reply #16 on: July 26, 2007, 06:13:00 PM » |
|
While this is a very great idea (and well worth exploring), I'm just wondering if in fact timing just won't get lost in background noise. For example, most systems wait for PLLs to stabilize (lock) before starting execution; stabilization time in turn will vary depending on crystal temperature for example (ie, the first few boot ups might be nanoseconds longer/faster). The CPU might also wait before other devices (ram? dvd? gpu?) pull-up a line to say they're all-go before doing anything.
Also this assumes that the number of instructions executed is *exactly* the same up until that memcmp; that sounds like a reasonable assumption, unless the box manages boot-up counters and per-boot information, but still you'd probably get the same amount of code executed anyways I guess. It won't be necessary to time from the start of boot. We can use a particular output on the POST bus to trigger the start and end of timing. So only 'noise' between the timer begin and end triggers will be a problem.
|
|
|
|
|
Logged
|
|
|
|
robinsod
Global Moderator
Xbox Hacker
    
Posts: 646
Perl packed my shorts during global destruction
|
 |
« Reply #17 on: July 26, 2007, 08:47:03 PM » |
|
I thought my random speculation about the CB section was a little hopeful in truth BUT it still seems the mostly likely way forward. For my money we need to go after something that is common to all boxes, trying to crack the hash of the lockdown counter on a per box basis is a "non-starter".
The per box signature on the CF sections will be hard to brute force because by that stage the CPU is fully (I believe) initialized and running at 3.2 GHz. Im sure there is test equipment that can operate at those speeds but they out of our league (even for a 1 off attack - no one here ever gained access to a SATA analyser AFAIK and that is probably cheap by comparison) and underclocking causes the FSB to fail.
The hash checking of CB should be our target, there are only 4 versions out there as opposed to 1 per box for the CPU key(assuming we need all 4 of them to be hacked) and it occurs when the processor may not be running at full speed or at least may be most tolerant of underclocking. We could underclock the processor using an OCXO (oven controlled xtal oscillator - even a Rubidium clock source costs < 2000 Euro) as the source for both the boxes clock and whatever is used to do the measuring - this could reduce a great deal of phase jitter. High speed comparators are also available (or FPGAs with 1V signaling interfaces may be available). It may be necessary to source lab grade measurement equipment (second hand test equipment sourced from eBay especially when synced to one central clock will not be overly expensive).
Erasing and rewriting a single block of NAND flash could take << 1 second, I am fairly sure Infectus could be persuaded to cooperate, we are currently working to interface to NAND Dump Tool with their infectus flash programmer. I mentioned the fact that 3 attempts are made to boot before the 3RLOD is because we get 3 samples per power cycle. This might be a nice multiple for the oversampling technique needed to detect a pass/fail decision. 3 attempts to boot are >> larger than the time to re flash 1 NAND block.
The other plus point to this attack is that it is easy to scale, I have perhaps 8 360 motherboards that could be dedicated to the task, add 8 infectus chips (and interface circuitry) and 1 high quality clock source plus a timer/counter and a controller PC and it becomes cost effective IF the result is applicable to all boxes, not just 1. I (and I am sure others here) have the experience to integrate such a system.....
So, here are 2 questions:
1) Can the "RE gods" determine the difference between a good and bad byte comparison during hash checking in terms of number of instructions executed between POST updates? The POST port seems the only choice for monitoring code behavior....
2) Can we underclock the hardware to the point where the difference between 1 good and bad byte comparison can be easily detected?
|
|
|
|
|
Logged
|
|
|
|
|
tmbinc
|
 |
« Reply #18 on: July 26, 2007, 10:04:42 PM » |
|
We are talking about the pairing hash, right?
|
|
|
|
|
Logged
|
Please don't copy/quote full text outside this board. Instead, summarize and link to this post. Thanks! This lets me keep information updated and doesn't pull things out of context.
|
|
|
robinsod
Global Moderator
Xbox Hacker
    
Posts: 646
Perl packed my shorts during global destruction
|
 |
« Reply #19 on: July 26, 2007, 10:23:12 PM » |
|
Hmmm, this is where you can help. I am interested in the hash that guarantees the integrity of the CB section. The ideal would be to remove the hash check on CD from the CB code and defeat the CB integrity check, that would be all that is necessary.
Its been a while since I looked at the de/encryption of the CB section & I always assumed the executable code is immutable but the question is what do we need to defeat to modify the CB section and allow execution? I really cant remember but is it the 16 bytes of msg data (I doubt it)? Or is there a 2048 bit certificate (more likely)? Thats not so scary IF we can detect a pass/fail decision when each byte is checked
|
|
|
|
|
Logged
|
|
|
|
|