Jump to content
oripash

2012 MBA + GTX 660Ti@10Gbps-TB1 (Sonnet EE Pro) + Win8 [oripash]

Recommended Posts

This is a loose guide for installing Windows 8 64-bit in EFI mode (e.g. not via bootcamp), dual-booting with MacOSX on a 2012 Macbook Air. It's a work in progress, so I update it from time to time. The purpose of this exercise is to set up a windows-based game rig on the mac using a thunderbolt-based eGPU, that will co-exist nicely with OSX.

FYI:

1. This is a re-post from other forums which I've moved here. This is the one I keep updated.

2. Some of the specifications of this rig are very deceptive (It has two thunderbolt channels and four PCIe 2.0 lanes. Except it doesn't fully use either.).

Please make sure you've read my short PCIe/thunderbolt Plumbing Primer to make sense of how stuff actually works.

On to the fun:

post-7708-14494993992487_thumb.gif

My experiences so far:

There are two roads to install Windows on a mac.

The road of BIOS and the road of EFI.

Older PC's only have BIOS. Windows on those PCs talks to hardware directly through the BIOS.

Macs come with EFI as the primary interface to the hardware. OSX talks to hardware directly through EFI.

Macs also come with a BIOS emulation, because through BIOS, Windows works flawlessly (with the exception of thunderbolt...). This is how bootcamp makes your windows work.

The newest Windows (Win7/64 and Win8/64 ONLY) can interface with hardware directly through EFI as well. Not all windows drivers are tested to work this way.

What happens when we connect an eGPU - the new eGPU thunderbolt device tells EFI/BIOS it exists. On a BIOS-based PC, the BIOS would enumerate it and tell the OS the device is ready. Tomshardware review of the Sonnet suggests this works flawlessly on a thunderbolt-equipped desktop motherboard.

On an EFI-based mac, things are a bit different. The thunderbolt device tells the EFI it exists. The EFI enumerates it as a PCI device and tells the OS the device is ready. That's what happens in OSX (which runs in straight EFI mode), and what happens in Win7/x64 or Win8/x64 if you installed them in straight EFI mode.

If, however, you run windows in regular BIOS mode (if you installed Windows via bootcamp, this is the case), Apple's BIOS emulation does not pass the thunderbolt enumeration event back to windows, and your thunderbolt eGPU doesn't work.

There's a way to make it work using a rain dance, where you connect the eGPU to the mac but not the AUX power plug to the GPU, turn mac on, get past the boot loader, immediately turn the GPU power on before windows completes booting, jump on one foot holding your left ear, bend over backwards twice, scream in agony, and on occasion your thunderbolt device gets recognized and appears in device manager. Even then, twice it disappeared on me while installing nVidia drivers. I gave up on trying to get thunderbolt eGPUs work it through Apple's BIOS emulation.

I decided to install windows in EFI mode. I tried windows7/64bit/EFI, ran into a pile of weirdness installing and gave up. I'm using Win8/64/EFI instead.

Setting up a dual-boot EFI on a macbook is easy:

a. NO BOOTCAMP.

b. when in OSX, fire up terminal, sudo to root and shrink your EFI OSX partition:

# diskutil resizevolume /dev/disk0s2 250G

(in this case, I have a 512GB SSD, I shrunk the partition to 250G).

c. DO NOT create windows partitions under OSX. DEFINITELY do not let boot camp do this for you - it creates MBR partitions, EFI windows won't install on that.

d. On some windows PC (or if you're like me, in your Windows7 parallels VM), Create a USB bootdrive of windows 8:

insert 4GB or larger USB disk. Note: below steps will wipe it. Proceed at own risk.

run command prompt as administrator

> diskpart
> list disk
(check which disk number your USB disk appears as, use it in the next commands)
> select your-usb-disk-number-from-previous-step
> clean
> convert gpt
> create partition primary
> select partition 1
> format quick fs=fat32
> assign
> exit

Now copy the guts of the windows 8 DVD or ISO onto this new drive.

Congrats, you now have an install drive.

e. Back on our macbook, I recommend installing rEFIt - install it, then open a shell, cd to /efi/refit and run:

sudo ./enable.sh

f. Reboot with the USB disk in. in the rEFIt menu, you should see two ways to boot from the USB disk - EFI and BIOS. Choose EFI.

g. Installing windows:

1. First boot: windows installation. When you get to the partitioning stage, you should have a block of empty space on your macbook SSD. Let windows create its EFI partitions on them and tell windows to format the last big one of these. Then proceed with the install.

It will copy files and reboot.

2. Seocnd boot: you don't need to do anything. It will go into a black screen (this is because the GMA4000 driver breaks in EFI mode), reboot on its own after a few minutes.

3. Third boot: Again, it'll go into a black screen again. LEAVE FOR 15 MINUTES for the installer to do its thing, then, after it presumably finished doing all the things it isn't showing you, HARD POWER-OFF.

4. Fourth boot: In the refit menu, choose to boot off the USB drive again. This time go into the recovery menu and fire up a command prompt. Delete the broken intel GMA4000 driver file (causing the default VGA driver to take over). Once in the shell, run:

C:\> del c:\windows\system32\drivers\igdkmd64.sys

Exit the shell and let the machine reboot. Finish the windows installation and go through the formalities.

5. Timekeeping: Windows likes to think that the machine's SAVED time (what time your machine thinks it is if you boot it completely offline) reflects the local time in your timezone. OSX likes to think SAVED time reflects time in Greenwich. They'll keep fighting between them over what time it is.

Solution:

In Windows, using regedit, navigate-to and add the following DWORD, and set it to 1:

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\RealTimeIsUniversal

Then reboot. Now let either OS set the time, and it will remain good across both.

6. Windows works!.

h. DO NOT TRY TO UPDATE THE DRIVER FOR THE GMA4000 onboard graphics. It will just reinstall a new (still broken, as of the time I'm writing this post) igdkmxd64.sys file, and force you to go through step 4 above again.

i. I installed Forceware 306.97 nVidia driver for Win8/64. It installed fine.

j. Go back to OSX. Fire up boot camp assistant and select "Download the latest Windows support software from Apple". Untick everything else. Save it on your USB or somewhere where windows can see it. It will create a WindowsSupport directory with drivers for all the Apple bits and an installer that installs all of them.

Boot into windows, go to the WindowsSupport folder on this USB stick and run setup.exe. This will properly install drivers for a few more things, including bluetooth. GMA4000, screen brightness controls & onboard audio will still not work.

The boot camp control panel in windows won't work - its start screen shows bootable partitions and it expects a hybrid MBR which we've very deliberately avoided setting up in our non-1980's shiny GPT partition structure (you can manually install a hybrid MBR and experiment using gdisk and the 'h' option in the recovery submenu, but that confuses the hell out of windows).

k. Things that don't work for me:

1. Screen brightness controls in Windows.

2. Sound driver. I just plugged in an external USB sound card I had lying around.

3. The GMA driver. There are four drivers you can use:

a. The GMA driver bundled with windows (or an updated WHQL one from windowsupdate).

b. The driver supplied in Apple's bootcamp driver pack.

c. The latest GMA driver downloadable from intel's website.

d. The default VGA driver in Windows.

4. Virtu (The installer fails to set up)

As of 26/11/2012, (a-c) do NOT work in EFI. This has nothing to do with the eGPU and whether it is connected or not. It has everything to do with the driver not yet being written to be compatible with windows working in straight EFI. I'm sure Intel will fix this at some point, I'm just not sure when this will happen. (a) and (B) will give you a yellow triangle in device manager, © will not (but still not work).

(d) works FINE (it's snappy and not laggy or anything, it doesn't feel like the good'ol "video card without a driver" in windows). It'll be 100% good for everything except gaming.

4. Boot camp control panel (to tweak behavior of apple hardware, trackpad options, what the button on your apple display does, etc). It opens up on the "partitions" tab, which it can't figure out because we have no hybrid MBR, so it bombs out.

The system tray icon still runs, and you can tweak some of the behavior via registry if you're thus inclined.

At the end of the day:

Steam works. So do all games I tried to date (Metro 2033, Borderlands, Portal 2...)

3DMark 2011 P5802 : Graphics Score: 7147, Physics Score: 3703 Combined Score: 3719

By contrast:

A retina Macbook Pro 15 with a Kepler dGPU does P2275, and an alienware M18x does P5602.

Mu-ha-ha.

I would REALLY love to compare this rig in a benchmark that is HIGHLY influenced by PCIe constraints (such as the Dirt3min test Anand ran here: AnandTech.com - The Radeon HD 7970 Reprise: PCIe Bandwidth, Overclocking, & The State Of Anti-Aliasing) using [a] a 660Ti with 2GB, a 660Ti with 3GB, [c] a 680/690 (at, say, 1080p and 2560x1600 res) and [d] Same 680/690 with 4GB.

This would show:

1. Whether having more GPU RAM results in meaningfully more on-card caching (both at the 660-level cards and 680 or 690 level cards), less need to shuttle textures over limited thunerbolt bandwidth and ultimately a meaningful performance increase.

2. Whether there's any point in putting a high-end GPU on this rig.

I don't have the required GPU's, but if anyone is in the Melbourne, Australia area and has one he can lend for the sake of this experiment, shoot me a private message and we'll try.

My kit:

Macbook Air 2012:

Dual-channel thunderbolt (Intel DSL3510L Cactus Ridge controller, details here), supposedly 2x10Gbit/direction (no more than 10Gbit/direction/GPU), 8GB RAM, 512GB SSD, CPU: Intel Core i7-3667U @ 2.00GHz/3.2GHz turbo'd (...always wanted to run a gamebox on a ULV 14Watt part :D)

Thunderbolt to PCIe: Sonnet Echo Express Pro (dual-channel thunderbolt, 2x10Gbit/direction, 1 thunderbolt channel->four PCIe 2.0 lanes)

GPU: Galaxy GTX660ti 3GB

Power supply: FSP X5, external to the Sonnet enclosure. (It's a 5.25'' 450Watt booster PSU).

I'm too lazy to pull enough 12V rails from the Sonnet's built-in 150W PSU to drive the card (and I don't want to accidentally fry it, it is an $800 part), so I'm feeding the GPU's power from an external $80 source.

Summary: to quote a vending machine, and referring directly to the number of PCIe lanes used... "FOUR HUNDRED PERCENT MORE AWESOME".

Next stuff on the agenda to test:

1. I will re-install something similar on a macbook air 2011, which only has one thunderbolt channel (an a 1.8GHz i7). I want to see the GPU performance difference between the two... and confirm if indeed the Sonnet is putting its four PCIe 2.0 lanes on one 10Gbit thunderbolt channel or teaming both thunderbolt channels to achieve this.

2. Lucid Virtu doesn't from Lucid's website doesn't install on Windows 8. However, there is a newer version on ASRock's website, and you can grab it from here: LucidLogix VIRTU 2.1.220 64-bit.rar

I want to try and install that, and see if I can get some mileage out of the GMA as well.

3. Optimus. The reason Optimus is interesting is the use of the internal monitor (meh.. it's 11''), and the compression it puts on PCIe (now THAT can be interesting). People say that it turns on when the nVidia driver shows you both GPUs in the drop-down. Mine does that. People also say it only enables when you have one PCIe 1.0 lane, and disables when you have more. I have a sneaky suspicion that compressing GIGABYTES of data in realtime - that's eight times what a single PCIe 1.0 lane carries - requires more oomph than the nVidia card has at its disposal, and the absence of said oomph is the reason it's disabled. Just a suspicion. That said, if this can be forced on somehow, I'd love to try. I haven't found any way of doing this, so this is just a background thought, not an active course of investigation I'm running with.

post-7708-14494993921203_thumb.jpg

post-7708-14494993969297_thumb.jpg

post-7708-14494993971285_thumb.jpg

  • Thumbs Up 7

Share this post


Link to post
Share on other sites

Awesome. You cover a couple of details that I'd otherwise miss. Now can I ask you to provide the full set of dx9, dx10 and dx11 benchmarks like shown in the DIY eGPU experiences implementations post ? <strike>Then you can be included as the first to submit x4 2.0 results.</strike> <-- it's about 10-15% faster than x2 2.0 due to the Thunderbolt downlink constricting the traffic. Disappointing.

Share this post


Link to post
Share on other sites
Awesome. You cover a couple of details that I'd otherwise miss. Now can I ask you to provide the full set of dx9, dx10 and dx11 benchmarks like shown in the DIY eGPU experiences implementations post ? Then you can be included as the first to submit x4 2.0 results.

You make a valid point about Optimus pci-e compression. They've have to setup a RX/TX pair with the CPU on one end and the CUDA processing on the other. Sending 4 lanes of compresses data could certainly bog down a system so very likely they chose only to enable that with one lane.

No promises on benchmarks - mainly because I only have so much time in between multiple jobs and three kids. I'll start by optimizing the system to the point where I'm happy (let the virtu line of inquiry play out) while using artificial benchmarks (3DMark11, 3DMark06), then once the system is in its final state, get some proper DX9,10,11,11.1 benchmarks done using real-world game engines. Again, no promises :)

Also.. I'm still not 100% sure I AM getting 4 x PCIe 2.0. Which is to say, I'm getting 4 x 2.0, I'm just not sure whether I'm getting the full 16000Megabits/sec that four PCIe lanes using two tb channels would allow, or capping out at 10000Megabits/sec because only one channel is being used.

What I know:

1. GPUz registers 4 x 2.0 (I'll post a screencap later)

2. Sonnet say 4 x 2.0

3. Sonnet say both thunderbolt channels are used.

4. The Sonnet has only one PCIe x4 bus feeding BOTH PCIe x16 mechanical slots. So I doubt they meant "you can put one SEPARATE device at the end of each 10Gbit channel" or "You can daisy-chain a second TB device that would put the other 10Gbit to use"

Regardless, I want to have some empiric evidence of two thunderbolt channels being used under those 4 PCIe lanes.

Running this on an almost-similar-specced macbook air 2011 (which had a single-channel thunderbolt controller) and comparing results may confirm the alternate hypothesis. I'll try running it on a high-res bench on both machines and we'll see how they stack up.

Share this post


Link to post
Share on other sites
Regardless, I want to have some empiric evidence of two thunderbolt channels being used under those 4 PCIe lanes.

Running this on an almost-similar-specced macbook air 2011 (which had a single-channel thunderbolt controller) and comparing results may confirm the alternate hypothesis. I'll try running it on a high-res bench on both machines and we'll see how they stack up.

Easily done. Just run CUDA-Z which will give you memory copy info such as shown for x1 2.0 below. Your x4 2.0 GTX660Ti should be seeing 4 times that result, 1500-1600MiB/s.



Core Information
----------------
Name: GeForce GTX 660
Compute Capability: 3.0
Clock Rate: 1084.5 MHz
PCI Location: 0:3:0
Multiprocessors: 5 (960 Cores)
Therds Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default

Memory Information
------------------
Total Global: 2048 MiB
Bus Width: 192 bits
Clock Rate: 3004 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65536
Texture 3D Size: 4096 x 4096 x 4096
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Unidirectional

Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 373.109 MiB/s
Host Pageable to Device: 360.146 MiB/s
Device to Host Pinned: 397.159 MiB/s
Device to Host Pageable: 382.978 MiB/s
Device to Device: 52.6243 GiB/s
GPU Core Performance
Single-precision Float: 1233.75 Gflop/s
Double-precision Float: 88.6771 Gflop/s
32-bit Integer: 353.296 Giop/s
24-bit Integer: 352.664 Giop/s

Generated: Tue Oct 23 23:07:17 2012
Runtime Dll Version: 4.20 (6,14,11,4020)

Share this post


Link to post
Share on other sites

Okay

1. Virtu install of 2.1.220/64 still requires installing 1.2.114 first. Attempting to install 1.2.114 gets me this:

post-7708-14494993986951_thumb.jpg

2. GPU-z says PCIe 2.0 x4:

post-7708-14494993987497_thumb.gif

Thunderbolt only gives me one channel:

CUDA-Z Report

=============

Version: 0.6.163 CUDA-Z

OS Version: Windows AMD64 6.2.9200

Driver Version: 310.33

Driver Dll Version: 5.0 (8.17.13.1033)

Runtime Dll Version: 4.20 (6,14,11,4020)

Core Information

----------------

CUDA-Z Report

=============

Version: 0.6.163 CUDA-Z

OS Version: Windows AMD64 6.2.9200

Driver Version: 310.33

Driver Dll Version: 5.0 (8.17.13.1033)

Runtime Dll Version: 4.20 (6,14,11,4020)

Core Information

----------------

Name: GeForce GTX 660 Ti

Compute Capability: 3.0

Clock Rate: 1084.5 MHz

PCI Location: 0:11:0

Multiprocessors: 7 (1344 Cores)

Therds Per Multiproc.: 2048

Warp Size: 32

Regs Per Block: 65536

Threads Per Block: 1024

Threads Dimensions: 1024 x 1024 x 64

Grid Dimensions: 2147483647 x 65535 x 65535

Watchdog Enabled: Yes

Integrated GPU: No

Concurrent Kernels: Yes

Compute Mode: Default

Memory Information

------------------

Total Global: 3072 MiB

Bus Width: 192 bits

Clock Rate: 3004 MHz

Error Correction: No

L2 Cache Size: 48 KiB

Shared Per Block: 48 KiB

Pitch: 2048 MiB

Total Constant: 64 KiB

Texture Alignment: 512 B

Texture 1D Size: 65536

Texture 2D Size: 65536 x 65536

Texture 3D Size: 4096 x 4096 x 4096

GPU Overlap: Yes

Map Host Memory: Yes

Unified Addressing: No

Async Engine: Yes, Unidirectional

Performance Information

-----------------------

Memory Copy

Host Pinned to Device: 771.347 MiB/s

Host Pageable to Device: 681.664 MiB/s

Device to Host Pinned: 891.037 MiB/s

Device to Host Pageable: 809.153 MiB/s

Device to Device: 50.3305 GiB/s

GPU Core Performance

Single-precision Float: 1773.2 Gflop/s

Double-precision Float: 128.439 Gflop/s

32-bit Integer: 510.769 Giop/s

24-bit Integer: 509.853 Giop/s

Generated: Fri Nov 30 00:06:00 2012

---

Sad times.

So while the PCIe bus hanging off the end of the thunderbolt rig is definitely PCIe 2.0 x4, it uses a transport that forms a 10Gbit bottleneck. That caps bandwidth at the equivalent of PCIe 2.0 x2.

Still not bad for an eGPU.

What one could do (given another big pile of cash) is daisy-chain a second thunderbolt device that would use the second channel. Slap a second GPU on it, and run them in SLI :D

Share this post


Link to post
Share on other sites

Apple say:

A Thunderbolt port or cable provides two 10 Gbps bidirectional links, but these two links cannot be bonded into a single channel. Host software must assign specific paths (for example, DP, PCIe, native) to each link to balance the load.

Mind you.. that only applies to bonding the links in a way transparent to a single device. You should still be able to "bond" them by running two discrete devices that can share load (read: SLI).

Share this post


Link to post
Share on other sites
Okay

1. Virtu install of 2.1.220/64 still requires installing 1.2.114 first. Attempting to install 1.2.114 gets me this:

[ATTACH=CONFIG]5477[/ATTACH]

..

Thunderbolt only gives me one channel:

Performance Information

-----------------------

Memory Copy

Host Pinned to Device: 771.347 MiB/s

Host Pageable to Device: 681.664 MiB/s

Device to Host Pinned: 891.037 MiB/s

Device to Host Pageable: 809.153 MiB/s

Device to Device: 50.3305 GiB/s

GPU Core Performance

Single-precision Float: 1773.2 Gflop/s

Double-precision Float: 128.439 Gflop/s

32-bit Integer: 510.769 Giop/s

24-bit Integer: 509.853 Giop/s

Generated: Fri Nov 30 00:06:00 2012

---

Sad times.

So while the PCIe bus hanging off the end of the thunderbolt rig is definitely PCIe 2.0 x4, it uses a transport that forms a 10Gbit bottleneck. That caps bandwidth at the equivalent of PCIe 2.0 x2.

Still not bad for an eGPU.

What one could do (given another big pile of cash) is daisy-chain a second thunderbolt device that would use the second channel. Slap a second GPU on it, and run them in SLI :D

Indeed. You are only getting x2 2.0 performance levels due to the Thunderbolt link. Now for whatever reason, your memory copies are still a good ~10% faster than mine. I'm using TH05 + GTX660@x2.2 on a 2012 13" MBP


=============
Version: 0.6.163 http://cuda-z.sf.net/
OS Version: Windows AMD64 6.1.7600
Driver Version: 306.97
Driver Dll Version: 5.0 (8.17.13.0697)
Runtime Dll Version: 4.20 (6,14,11,4020)

Core Information
----------------
Name: GeForce GTX 660
Compute Capability: 3.0
Clock Rate: 1084.5 MHz
PCI Location: 0:10:0
Multiprocessors: 5 (960 Cores)
Therds Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default

Memory Information
------------------
Total Global: 2048 MiB
Bus Width: 192 bits
Clock Rate: 3004 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65536
Texture 3D Size: 4096 x 4096 x 4096
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Unidirectional

Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 695.801 MiB/s
Host Pageable to Device: 642.796 MiB/s
Device to Host Pinned: 787.19 MiB/s
Device to Host Pageable: 728.874 MiB/s
Device to Device: 52.3023 GiB/s
GPU Core Performance
Single-precision Float: 1230.13 Gflop/s
Double-precision Float: 88.6785 Gflop/s
32-bit Integer: 352.822 Giop/s
24-bit Integer: 352.426 Giop/s

Generated: Fri Nov 30 02:24:26 2012
CUDA-Z Report

Haven't tried to install Virtu on Win8. My instructions work perfectly with Win7. Atm I'm using a scratch HDD with MBR-installed Win7 + Setup 1.1x with a MBP 13" 2012. Needed to go back to Win7 because 3dmark06 has lower results in Win8.

Oh.. and way back someone tried to do a x1 1.0 SLI config finding the driver would only allow SLI if there was a x4 link. So no cookie.

Apparently, all the 2012 Macbooks use a DSL3510 Cactus Ridge Thunderbolt controller. Certainly, its in my MBP 13" (pci ID 8086:1549). I'm wondering then if someone with a 13 or 15" MBPr with it's dual-Thunderbolt ports could mate it with a Sonnet Echo Express (x4 2.0 capable), wiring up *both* Thunderbolt ports. Would CUDA-Z (NVidia) or PCIeSpeedTest (AMD) show 1500-1600MiB/s CPU<->GPU memory transfer rate to confirm x4 2.0 operation?

Share this post


Link to post
Share on other sites
Needed to go back to Win7 because 3dmark06 has lower results in Win8.

Here too. I'm only getting ~11000 on 3DMark06. Who cares? :P

Oh.. and way back someone tried to do a x1 1.0 SLI config finding the driver would only allow SLI if there was a x4 link. So no cookie.

There IS an PCIe x4 link (if that's the tickbox the nVidia driver is looking for to activate). So maybe yes cookie. Rabidly expensive cookie, and big, ugly pile of noisy bricks on desk, but a possible cookie nevertheless.

Apparently, all the 2012 Macbooks use a DSL3510 Cactus Ridge Thunderbolt controller..

Certainly, its in my MBP 13" (pci ID 8086:1549). I'm wondering then if someone with a 13 or 15" MBPr with it's dual-Thunderbolt ports could mate it with a Sonnet Echo Express (x4 2.0 capable), wiring up *both* Thunderbolt ports.

I think you have it wrong.

Cactus ridge has FOUR thunderbolt channels. That's 80Gbit/sec, counting all channels and all directions.

On an rMBP, they pop out over TWO plugs, each carrying 40Gbits, in the form of two thunderbolt channels each capable of carrying 20Gbits (10 per direction).

The even more interesting thing is that the Macbook Air is (this is teardown-based fact, not assumption) equipped NOT with the "DSL3310 Cactus Ridge 2C" part required to drive two channels over one mechanical port, but with the "DSL3510 Cactus Ridge 4C" part required to drive four channels over two mechanical ports. Five bucks says the 2013 MBA will have two thunderbolt ports :P

Things to note:

[a] A single device (e.g. a Sonnet->PCIex4->GPU box) cannot utilize more than one channel.

If you have a DisplayPort/DVI monitor sitting on the thunderbolt daisy-chain, that sucks up one entire thunderbolt channel.

My *theory* is that we don't need someone with two thunderbolt ports. We just need ANY 2012 rig, like mine, even with one port + ANOTHER Sonnet+GPU daisy-chained to the first. That would use the second channel over the single mechanical link, and give us access to an SLI-combined 20Gbits/direction, e.g. @4.2.

A $1200 experiment with another sonnet, cheaper with the TH05. biggrin.png

That, or wait for the 2013 model, spend $4K (again, cheaper with TH05) to set up a 4-way eGPU SLI rig working at @8.2.

Note: There does seem to be a gap in my understanding. The 2011 macbook air was equipped with a DSL2310 Eagle Ridge 1-port, 2-channel controller. It was able to drive ONLY ONE external thunderbolt display. But if it had two channels on that one port, shouldn't it have been able to drive two daisy-chained displays?

The 2012 - 2port/4-channel controller - still only has one mechanical port. Unlike its 2011 sibling, it supports TWO thunderbolt displays.

From the standpoint of the thunderbolt controller, both the one used in 2011 and the one in 2012 had two channels supported on the chip. Apple may have skimped on additional required infrastructure on the 2011 motherboard to enable this, but put that infrastructure in place on the 2012 model. That's one valid explanation supported by the evidence.

This would mean two daisy-chained sonnet+GPU will work on the 2012 model but not on the 2011 one.

Now, to stir things further, the number of TB devices isn't limited to the number of channels. You can theoretically drive seven thunderbolt devices in a daisy-chain (I saw someone somewhere on the interwebs do this). Obviously, they share the bandwidth. This /might/ mean they still only have one channel on the 2012 models, but managed to get two TB displays to share that one channel. From a bandwidth standpoint, a single PCIe channel is enough to do this.. one thunderbolt channel is more than enough. "Only one thunderbolt channel per mac port, still, and Apple using just the one thunderbolt channel to drive multiple displays" is another valid explanation, and the evidence seems to fit that too. In this case, a 2012-MBA SLI would not work.

Which of the above it is remains unclear to me.

Share this post


Link to post
Share on other sites

New course of action:

1. $180 - just ordered my new TH05 :)

Useful in and out of this project scope, because I can use to gamify my wife's 2011MBA as well and use it as an auxillary gamebox :D

2. Once I get it, I'll daisy-chain it to the sonnet, and stick my old ATI HD4850 in it - then run CUDA-Z and ATI's equivalent simultaneously, and see if I can get two GPUs pumping 10Gb simultaneously.

If that works...

3. Hit the shops again and grab another GTX660Ti. My first ever SLI rig is going to be a #@!!$ing macbook air. Woo.

Share this post


Link to post
Share on other sites
3. Hit the shops again and grab another GTX660Ti. My first ever SLI rig is going to be a #@!!$ing macbook air. Woo.

While both the Sonnet Echo Express and Th05 give real-world x2 2.0 bandwidth, the Sonnet will register a x4 2.0 link speed with the Thunderbolt controller and the TH05 will register x2 2.0. As you noted, need the driver to see them both at x4 2.0 to allow the SLI option to be enabled.

Share this post


Link to post
Share on other sites
Now for whatever reason, your memory copies are still a good ~10% faster than mine. I'm using TH05 + GTX660@x2.2 on a 2012 13" MBP

The TH05 system has a thunderbolt pipe (10Gbit) and a PCIe 2.0 x2 controller (8Gbit) hanging off it. It bottlenecks on the PCIe (at 8Gbit/s).

The Sonnet system has a thunderbolt pipe (10Gbit) and a PCIe 2.0 x4 controller (16Gbit) hanging off it. It bottlenecks on the thunderbolt (at 10Gbit/s).

Seems like the easiest way to explain the 10% speed difference.

Share this post


Link to post
Share on other sites

Had some concerns about the CUDA-z numbers from before because I had other stuff running in the background - unsure how much effect these had.

Just ran it again without anything running in the background:

CUDA-Z Report

=============

Version: 0.6.163 CUDA-Z

OS Version: Windows AMD64 6.2.9200

Driver Version: 310.33

Driver Dll Version: 5.0 (8.17.13.1033)

Runtime Dll Version: 4.20 (6,14,11,4020)

Core Information

----------------

Name: GeForce GTX 660 Ti

Compute Capability: 3.0

Clock Rate: 1084.5 MHz

PCI Location: 0:11:0

Multiprocessors: 7 (1344 Cores)

Therds Per Multiproc.: 2048

Warp Size: 32

Regs Per Block: 65536

Threads Per Block: 1024

Threads Dimensions: 1024 x 1024 x 64

Grid Dimensions: 2147483647 x 65535 x 65535

Watchdog Enabled: Yes

Integrated GPU: No

Concurrent Kernels: Yes

Compute Mode: Default

Memory Information

------------------

Total Global: 3072 MiB

Bus Width: 192 bits

Clock Rate: 3004 MHz

Error Correction: No

L2 Cache Size: 48 KiB

Shared Per Block: 48 KiB

Pitch: 2048 MiB

Total Constant: 64 KiB

Texture Alignment: 512 B

Texture 1D Size: 65536

Texture 2D Size: 65536 x 65536

Texture 3D Size: 4096 x 4096 x 4096

GPU Overlap: Yes

Map Host Memory: Yes

Unified Addressing: No

Async Engine: Yes, Unidirectional

Performance Information

-----------------------

Memory Copy

Host Pinned to Device: 781.77 MiB/s

Host Pageable to Device: 696.084 MiB/s

Device to Host Pinned: 887.713 MiB/s

Device to Host Pageable: 811.878 MiB/s

Device to Device: 50.3162 GiB/s

GPU Core Performance

Single-precision Float: 1772.51 Gflop/s

Double-precision Float: 128.488 Gflop/s

32-bit Integer: 510.93 Giop/s

24-bit Integer: 510.031 Giop/s

Generated: Sun Dec 02 08:52:58 2012

Nando, which of the four memory copy figures that happen over thunderbolt/PCIe were you comparing to yours?

Share this post


Link to post
Share on other sites

Note: There does seem to be a gap in my understanding. The 2011 macbook air was equipped with a DSL2310 Eagle Ridge 1-port, 2-channel controller. It was able to drive ONLY ONE external thunderbolt display. But if it had two channels on that one port, shouldn't it have been able to drive two daisy-chained displays?

According to these guys, a thunderbolt display eats 7Gbits/sec.

Share this post


Link to post
Share on other sites

Hey, i just wanted to throw something in about the hd4000 and sound driver issue.

i read my computer hardware with sisoft sandra under efi and under bios mode.

one thing that occured to me was the following difference:

BIOS Mode:

<< SMBus (Mainboard) Deviceregister >>
<< Intel ICH SMBus (efa0) >>
< Bus 0, Device 2C, Function 0 >
00 - 0F: 00 00 00 FC 00 6D 20 00 00 00 00 00 00 00 00 00
.....

EFI Mode:

<< SMBus (Mainboard) Deviceregister >>
<< Intel ICH SMBus (a0617000) >>
Error (1200): No devices found.

Does this help in any way? As wikipedia says the smbus is responsible for switching devices on/of or dimming the display. if the intel driver cant access it then we might have problems with display and sound. so somehow we have to get the Intel ICH SMBus back in place under efi!?

maybe someone more tech-savvy can use this with mm command?

Share this post


Link to post
Share on other sites

Oripash, any changes to your install guide?

Can the installation be done without rEFIt/rEFInd?

Thanks for making it so complete, how do the new bootcamp drivers work?

Share this post


Link to post
Share on other sites

I was trying to do an UEFI install of Windows 8.1 on 2012 MacBook pro and followed these suggestion. The part I am stuck on is the deleting the display drivers. I Do not see the file to delete, yet my screen is still black. PLease can someone offer suggestions on what to do ? Thanks

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

By using this site, you agree to our Terms of Use. We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.