bloodhawk Posted October 5, 2016 Share Posted October 5, 2016 (edited) Below we see a 32Gbps-NGFF.M2 and a 32Gbps-TB3 eGPU implementation. The NGFF.M2 interface directly attaches the video card to the southbridge giving better performance at the cost of inconvenient under keyboard access. The Thunderbolt controller adding dragging latency but provided a convenient single cable solution using the external TB3 port. Notebook 17" Clevo P870DM-G Intel Z170 chipset with 8Gbps (Gen3) Southbridge PCIe ports i7-6700K CPU 64GB RAM 512GB PCIe SSD GTX980 dGPU Win10 eGPU Gear PE4C V4.1 - NGFF.M2 InXtron HDK - TB3 NVidia GTX1080 EVGA 600B PSU Hardware pictures (InXtron-HDK-TB3/PE4C V4.1-NGFF.M2) Spoiler http://imgur.com/a/BtVwJ Spoiler Final Setup (for now) : InXtron HDK @32Gbps-TB3 TB3 Benchmark results http://www.3dmark.com/3dm/15193649 -Standard, GPU = 18536 http://www.3dmark.com/3dm/15191071 - Ultra, GPU = 10007 http://www.3dmark.com/3dm/15193921 - Extreme, GPU = 5206 http://www.3dmark.com/3dm/15193762 - TimeSpy, GPU = 6855 Spoiler PE4C V4.1 - @32Gbps-NGFF.M2 CUDA-Z : Spoiler CUDA-Z Report ============= Version: 0.10.251 64 bit http://cuda-z.sf.net/ OS Version: Windows x86 6.2.9200 Driver Version: 372.70 Driver Dll Version: 8.0 (6.14.13.7270) Runtime Dll Version: 6.50 Core Information ---------------- Name: GeForce GTX 1080 Compute Capability: 6.1 Clock Rate: 1809.5 MHz PCI Location: 0:5:0 Multiprocessors: 20 Threads Per Multiproc.: 2048 Warp Size: 32 Regs Per Block: 65536 Threads Per Block: 1024 Threads Dimensions: 1024 x 1024 x 64 Grid Dimensions: 2147483647 x 65535 x 65535 Watchdog Enabled: Yes Integrated GPU: No Concurrent Kernels: Yes Compute Mode: Default Stream Priorities: Yes Memory Information ------------------ Total Global: 8192 MiB Bus Width: 256 bits Clock Rate: 5005 MHz Error Correction: No L2 Cache Size: 48 KiB Shared Per Block: 48 KiB Pitch: 2048 MiB Total Constant: 64 KiB Texture Alignment: 512 B Texture 1D Size: 131072 Texture 2D Size: 131072 x 65536 Texture 3D Size: 16384 x 16384 x 16384 GPU Overlap: Yes Map Host Memory: Yes Unified Addressing: Yes Async Engine: Yes, Bidirectional Performance Information ----------------------- Memory Copy Host Pinned to Device: 2842.01 MiB/s Host Pageable to Device: 2414.87 MiB/s Device to Host Pinned: 2981.8 MiB/s Device to Host Pageable: 2559.01 MiB/s Device to Device: 111.991 GiB/s GPU Core Performance Single-precision Float: 9246.7 Gflop/s Double-precision Float: 295.548 Gflop/s 64-bit Integer: 465.641 Giop/s 32-bit Integer: 2742.09 Giop/s 24-bit Integer: 2084.85 Giop/s Generated: Tue Oct 04 23:08:26 2016 32Gbps-NGFF.M2 benchmark results http://www.3dmark.com/fs/10425255 - Standard, GPU = 23143http://www.3dmark.com/3dm/15359522 - Extreme, GPU = 10938http://www.3dmark.com/3dm/15250559 - Ultra, GPU = 5451http://www.3dmark.com/fs/10398914 - Ultra, GPU = 5635http://www.3dmark.com/3dm/15360121 - TimeSpy, GPU = 7672 Spoiler Edited October 26, 2016 by bloodhawk formatting and added intro 2 Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted October 5, 2016 Share Posted October 5, 2016 27 minutes ago, bloodhawk said: I can confirm the link speeds over the M.2 ports on the Z170 chip-set. (P870DM-G) using a PE4C v4.1 . Will see if i can check thing with my friends GT72VR (or something) The hwinfo64 screenshots confirm the M.2 eGPU port hosting your GTX1080 is running at 8Gbps. It doesn't give the link width. I assume it would be x1 3.0 but GPU-Z would tell us exactly. 1 Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 5, 2016 Author Share Posted October 5, 2016 25 minutes ago, Tech Inferno Fan said: The hwinfo64 screenshots confirm the M.2 eGPU port hosting your GTX1080 is running at 8Gbps. It doesn't give the link width. I assume it would be x1 3.0 but GPU-Z would tell us exactly. Yep. in GPUz and NvInspector the link does show up as PCIe 3.0 x4 . Im not sure how much accurate that is, specially since the actual limit is 8GT/s 8Gbps). Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted October 5, 2016 Share Posted October 5, 2016 39 minutes ago, bloodhawk said: Yep. in GPUz and NvInspector the link does show up as PCIe 3.0 x4 . Im not sure how much accurate that is, specially since the actual limit is 8GT/s 8Gbps). Pls run CUDA-Z on your eGPU and post the result. We know from https://www.techinferno.com/index.php?/forums/topic/5226-2013-15-macbook-pro-gt750m-gtx780ti16gbps-tb2-sonnet-iii-d-win81-squinks/ CUDA-Z Host-to-Device Bandwidth TB2 -16Gbps: 1258 MiB/s TB1-10Gbps: 781MiB/s link TB1-8Gbps (x2 2.0): 697MiB/s link EC2-4Gbps (x1 2.0): 373MiB/s link We'll quickly be able to establish it's bandwidth using these reference Thunderbolt results. Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 5, 2016 Author Share Posted October 5, 2016 33 minutes ago, Tech Inferno Fan said: Pls run CUDA-Z on your eGPU and post the result. We know from CUDA-Z Host-to-Device Bandwidth TB2 -16Gbps: 1258 MiB/s TB1-10Gbps: 781MiB/s link TB1-8Gbps (x2 2.0): 697MiB/s link EC2-4Gbps (x1 2.0): 373MiB/s link We'll quickly be able to establish it's bandwidth using these reference Thunderbolt results. Here you go good sir - Ill try to do another test over TB3 tomorrow. This is with quite a few USB devices plugged in and the dGPU @ full link speeds. Quote CUDA-Z Report ============= Version: 0.10.251 64 bit http://cuda-z.sf.net/ OS Version: Windows x86 6.2.9200 Driver Version: 372.70 Driver Dll Version: 8.0 (6.14.13.7270) Runtime Dll Version: 6.50 Core Information ---------------- Name: GeForce GTX 1080 Compute Capability: 6.1 Clock Rate: 1809.5 MHz PCI Location: 0:5:0 Multiprocessors: 20 Threads Per Multiproc.: 2048 Warp Size: 32 Regs Per Block: 65536 Threads Per Block: 1024 Threads Dimensions: 1024 x 1024 x 64 Grid Dimensions: 2147483647 x 65535 x 65535 Watchdog Enabled: Yes Integrated GPU: No Concurrent Kernels: Yes Compute Mode: Default Stream Priorities: Yes Memory Information ------------------ Total Global: 8192 MiB Bus Width: 256 bits Clock Rate: 5005 MHz Error Correction: No L2 Cache Size: 48 KiB Shared Per Block: 48 KiB Pitch: 2048 MiB Total Constant: 64 KiB Texture Alignment: 512 B Texture 1D Size: 131072 Texture 2D Size: 131072 x 65536 Texture 3D Size: 16384 x 16384 x 16384 GPU Overlap: Yes Map Host Memory: Yes Unified Addressing: Yes Async Engine: Yes, Bidirectional Performance Information ----------------------- Memory Copy Host Pinned to Device: 2842.01 MiB/s Host Pageable to Device: 2414.87 MiB/s Device to Host Pinned: 2981.8 MiB/s Device to Host Pageable: 2559.01 MiB/s Device to Device: 111.991 GiB/s GPU Core Performance Single-precision Float: 9246.7 Gflop/s Double-precision Float: 295.548 Gflop/s 64-bit Integer: 465.641 Giop/s 32-bit Integer: 2742.09 Giop/s 24-bit Integer: 2084.85 Giop/s Generated: Tue Oct 04 23:08:26 2016 Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted October 5, 2016 Share Posted October 5, 2016 10 minutes ago, bloodhawk said: Here you go good sir - Ill try to do another test over TB3 tomorrow. This is with quite a few USB devices plugged in and the dGPU @ full link speeds. You have "Host Pinned to Device: 2842.01 MiB/s" which would mean it's a x4 3.0 (32Gbps) link as reported by GPU-Z. That would be faster than TB3 since there is no additional TB3 controller latency to deal with. Rather, it's a direct electrical link to the Intel southbridge. 1 Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 5, 2016 Author Share Posted October 5, 2016 Just now, Tech Inferno Fan said: You have "Host Pinned to Device: 2842.01 MiB/s" which would mean it's a x4 3.0 (32Gbps) link as reported by CUDA-Z. That would be faster than TB3 since there is no additional TB3 controller latency to deal with. Rather, it's a direct electrical link to the Intel southbridge. Gotcha. Figured as much specially after the jump in scores compared to TB3. Im adding content to a thread over at the other place, that shall not be named. Will update the other thread soon after with the benchmarks. Can you please link me to an existing/central thread where members talk about tweaking such setups ? Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted October 5, 2016 Share Posted October 5, 2016 @bloodhawk, I moved the discussion content here since it was referencing your implementation, plus I've summarized your hardware as I know it in the opening post. Pictures of the actual hardware and how the PE4C V4.1 connects would bring it to life. A unique eGPU implementation that's for sure. 1 Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 5, 2016 Author Share Posted October 5, 2016 (edited) 1 hour ago, Tech Inferno Fan said: @bloodhawk, I moved the discussion content here since it was referencing your implementation, plus I've summarized your hardware as I know it in the opening post. Pictures of the actual hardware and how the PE4C V4.1 connects would bring it to life. A unique eGPU implementation that's for sure. Yeap, working on that. I dont have a great camera at hand right now. But the One M8 will have to do for now Thank you for moving the posts. Was about to PM you about the same. Edited October 5, 2016 by bloodhawk 1 Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted October 5, 2016 Share Posted October 5, 2016 11 minutes ago, bloodhawk said: Yeap, working on that. I dont have a great camera at hand right now. But that the One M8 will have to do for now Thank you for moving the posts. Was about to PM you about the same. Ok.. added a EC2-4Gbps reference to the discussion. EC2 is also directly wired to the Southbridge. if x1 2.0 = 373MiB/s, then multiple by 2 to get x1 3.0, then multiple by 4 to get x4 3.0. The result is 2984MiB/s, which is onpar with your 2842Mi/s. So definitely running x4 3.0. EC2-4Gbps (x1 2.0): 373MiB/s link Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 5, 2016 Author Share Posted October 5, 2016 (edited) 22 minutes ago, Tech Inferno Fan said: Ok.. added a EC2-4Gbps reference to the discussion. EC2 is also directly wired to the Southbridge. if x1 2.0 = 373MiB/s, then multiple by 2 to get x1 3.0, then multiple by 4 to get x4 3.0. The result is 2984MiB/s, which is onpar with your 2842Mi/s. So definitely running x4 3.0. EC2-4Gbps (x1 2.0): 373MiB/s link That makes sense. The Device to Host actually hovers around 29XX MiB/s when there are not too many tabs in Chrome or im not doing much. Next step is to figure out if its possible to create a powered extension cable/connector. These speeds made me wonder, if the Proprietary AGA connector is just a fancy looking M.2 port extension. Id actually be willing to buy one and tear it down if someone can get me the service manual with the pin outs. Edited October 5, 2016 by bloodhawk 1 Quote Link to comment Share on other sites More sharing options...
Guest Posted October 12, 2016 Share Posted October 12, 2016 On 10/4/2016 at 11:53 PM, bloodhawk said: That makes sense. The Device to Host actually hovers around 29XX MiB/s when there are not too many tabs in Chrome or im not doing much. Next step is to figure out if its possible to create a powered extension cable/connector. These speeds made me wonder, if the Proprietary AGA connector is just a fancy looking M.2 port extension. Id actually be willing to buy one and tear it down if someone can get me the service manual with the pin outs. It would be nice if you could put Alienware's eGPU to better use an an immensely more powerful machine like your P870DM. Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 12, 2016 Author Share Posted October 12, 2016 40 minutes ago, Mr. Fox said: It would be nice if you could put Alienware's eGPU to better use an an immensely more powerful machine like your P870DM. Yeah I'm looking for a service manual right now. If I don't find one, I'll just order a good multimeter and an AGA in a week or 2 to test. But I'm still looking for the female end of the connector that is preset on the system end of the AW's. Quote Link to comment Share on other sites More sharing options...
Guest Posted October 12, 2016 Share Posted October 12, 2016 Maybe you can find a dead parts-old Alienware laptop to steal the port off of it. If you can source one you might be able to get something from Mouser to make your own connector. Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 13, 2016 Author Share Posted October 13, 2016 (edited) 4 hours ago, Mr. Fox said: Maybe you can find a dead parts-old Alienware laptop to steal the port off of it. If you can source one you might be able to get something from Mouser to make your own connector. Gotcha. Will look around. The female port isn't all that much of a big deal, can always rig something up with pins. Edited October 13, 2016 by bloodhawk Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted October 31, 2016 Share Posted October 31, 2016 @bloodhawk, you wouldn't still have the TB3 enclosure? We need some CUDA-Z output to confirm if it's providing 20Gbps or 32Gbps. Without CUDA-Z your noticably faster 3dmark result using NGFF.M2 compared to TB3 favoring would suggest your TB3 interface was 20Gbps. See the discussion RE: TB3 bandwidth at https://www.techinferno.com/index.php?/forums/topic/10718-2016-macbook-pros-and-egpus/&do=findComment&comment=151839 Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted October 31, 2016 Author Share Posted October 31, 2016 (edited) 4 hours ago, Tech Inferno Fan said: @bloodhawk, you wouldn't still have the TB3 enclosure? We need some CUDA-Z output to confirm if it's providing 20Gbps or 32Gbps. Without CUDA-Z your noticably faster 3dmark result using NGFF.M2 compared to TB3 favoring would suggest your TB3 interface was 20Gbps. See the discussion RE: TB3 bandwidth at I still do have it. Ill try to install it tonight and test it out. Edited October 31, 2016 by bloodhawk Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted November 1, 2016 Author Share Posted November 1, 2016 @Tech Inferno Fan Here it is : Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted November 1, 2016 Share Posted November 1, 2016 7 minutes ago, bloodhawk said: @Tech Inferno Fan Here it is : Great. Looking lower but not x4 2.0 levels. Are you using an active TB3 cable? Under the same load conditions, can you re-run CUDA-Z on the NGFF.M2 32Gbps interface? Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted November 1, 2016 Author Share Posted November 1, 2016 1 minute ago, Tech Inferno Fan said: Great. Looking lower but not x4 2.0 levels. Under the same load conditions, can you re-run CUDA-Z on the NGFF.M2 32Gbps interface? I actually did last night, it was the same as posted in the OP, +/- 75-100. Even this hit around 2200 MiB/s. Ill post a screen once i connect it over M.2 in about 20 mins. Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted November 1, 2016 Author Share Posted November 1, 2016 @Tech Inferno Fan This is with M.2 NGFF Output : Spoiler CUDA-Z Report ============= Version: 0.10.251 64 bit http://cuda-z.sf.net/ OS Version: Windows x86 6.2.9200 Driver Version: 375.70 Driver Dll Version: 8.0 (6.14.13.7570) Runtime Dll Version: 6.50 Core Information ---------------- Name: GeForce GTX 1080 Compute Capability: 6.1 Clock Rate: 1733.5 MHz PCI Location: 0:62:0 Multiprocessors: 20 Threads Per Multiproc.: 2048 Warp Size: 32 Regs Per Block: 65536 Threads Per Block: 1024 Threads Dimensions: 1024 x 1024 x 64 Grid Dimensions: 2147483647 x 65535 x 65535 Watchdog Enabled: Yes Integrated GPU: No Concurrent Kernels: Yes Compute Mode: Default Stream Priorities: Yes Memory Information ------------------ Total Global: 8192 MiB Bus Width: 256 bits Clock Rate: 5005 MHz Error Correction: No L2 Cache Size: 48 KiB Shared Per Block: 48 KiB Pitch: 2048 MiB Total Constant: 64 KiB Texture Alignment: 512 B Texture 1D Size: 131072 Texture 2D Size: 131072 x 65536 Texture 3D Size: 16384 x 16384 x 16384 GPU Overlap: Yes Map Host Memory: Yes Unified Addressing: Yes Async Engine: Yes, Bidirectional Performance Information ----------------------- Memory Copy Host Pinned to Device: 2784.71 MiB/s Host Pageable to Device: 2371.13 MiB/s Device to Host Pinned: 2541.68 MiB/s Device to Host Pageable: 2224.87 MiB/s Device to Device: 118.4 GiB/s GPU Core Performance Single-precision Float: 7892.93 Gflop/s Double-precision Float: 259.88 Gflop/s 64-bit Integer: 398.933 Giop/s 32-bit Integer: 2363.25 Giop/s 24-bit Integer: 1794.89 Giop/s Generated: Mon Oct 31 22:28:55 2016 Quote Link to comment Share on other sites More sharing options...
Tech Inferno Fan Posted November 1, 2016 Share Posted November 1, 2016 @bloodhawk, the NGFF.M2 GPU-Z output has faster fillrate, bandwidth, GPU clocks. Can you do like-for-like comparisons? In addition, pls ensure the machine has the battery installed and is using a high-performance power profile. You may consider also installing Throttlestop and disabling C1E, EIST, C6/C7 which (at least a couple of generations ago), would affect SATA SSD performance. That may help bring the NGFF.M2 and TB3 CUDA-Z bandwith results closer. Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted November 1, 2016 Author Share Posted November 1, 2016 (edited) 26 minutes ago, Tech Inferno Fan said: @bloodhawk, the NGFF.M2 GPU-Z output has faster fillrate, bandwidth, GPU clocks. Can you do like-for-like comparisons? In addition, pls ensure the machine has the battery installed and is using a high-performance power profile. You may consider also installing Throttlestop and disabling C1E, EIST, C6/C7 which (at least a couple of generations ago), would affect SATA SSD performance. That may help bring the NGFF.M2 and TB3 CUDA-Z bandwith results closer. Doing another test over TB3 will take a few days unfortunately. I have my OS NVME SSD connected to the system over TB3 right now and renders are running. But the clocks were exactly the same in both tests with completely bone stock drivers. The machine has the battery installed (always) and is using the High Performance Profile (Always). ( i never run the system without or on the battery , its always connected to AC and is on the High Performance profile with C-States disabled on the processor). I did try higher clocks when the GPU was connected over TB3 (2088 Mhz), but CUDAZ output was exactly the same. Also im using the stock 50cm 40Gbps TB3 cable. Not really going to invest in another cable anytime soon, since i dont really need it. To be honest, i haven't even found one for sale online. Edited November 1, 2016 by bloodhawk 1 Quote Link to comment Share on other sites More sharing options...
Splitframe Posted November 21, 2016 Share Posted November 21, 2016 (edited) On 10/5/2016 at 6:05 AM, bloodhawk said: TB3 Benchmark results http://www.3dmark.com/3dm/15193649 -Standard, GPU = 18536 http://www.3dmark.com/3dm/15191071 - Ultra, GPU = 10007 http://www.3dmark.com/3dm/15193921 - Extreme, GPU = 5206 http://www.3dmark.com/3dm/15193762 - TimeSpy, GPU = 6855 (You mixed up ultra and extreme here) These number seem confirm that with increasing FPS the total efficiency over TB3 goes down. A normal GTX 1080 has scores around these: Normal - 21905 Extreme - 10293 Ultra - 5020 Source: http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-1080-review,28.html Note that these values are form a non overclocked reference card, your ROG scored higher over x4.3 than a non-OC over x16.3. It seems like over direct PCIe x4.3 there is not much of a loss so I will take your x4.3 values from above for the calculation. ( 23143 / 10938 / 5451 ) When we look at TB3 compared to x4.3 there is a drop off of 20% at normal, 8.5% at extreme and 4.5% at ultra. The benchmark runs at around 100/60/10 FPS for normal/extreme/ultra respectively. My guess based on these values and experiences from the old x1.2/x1.3 times is that the performance hit over TB3/unsufficient bandwidth in general increases with higher FPS and everyone who looks at 144Hz gaming over TB3 should think twice. The 60 FPS mark seems in a good spot for me personally, the 8.5% hit is bearable for me. I wonder if or how this is in normal games and not synthetic benchmarks though and how much peripherals take away in addition. Edited November 21, 2016 by Splitframe Quote Link to comment Share on other sites More sharing options...
bloodhawk Posted November 21, 2016 Author Share Posted November 21, 2016 2 minutes ago, Splitframe said: (You mixed up ultra and extreme here) These number seem confirm that with increasing FPS the total efficiency over TB3 goes down. A normal GTX 1080 has scores around these: Normal - 21905 Extreme - 10293 Ultra - 5020 Source: http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-1080-review,28.html Note that these values are form a non overclocked reference card. It seems like over direct PCIe x4.3 there is not much of a loss so I will take your x4.3 values from above for the calculation. ( 23143 / 10938 / 5451 ) When we look at TB3 compared to x4.3 there is a drop off of 20% at normal, 8.5% at extreme and 4.5% at ultra. The benchmark runs at around 100/60/10 FPS for normal/extreme/ultra respectively. My guess based on these values and experiences from the old x1.2/x1.3 times is that the performance hit over TB3/unsufficient bandwidth in general increases with higher FPS and everyone who looks at 144Hz gaming over TB3 should think twice. The 60 FPS mark seems in a good spot for me personally, the 8.5% hit is bearable for me. I wonder if or how this is in normal games and not synthetic benchmarks though and how much peripherals take away in addition. Definitely agree. Are you referring to peripherals taking away from TB3 bandwidth or PCIe x4? At least on my system in either case , I didn't have any issues with peripherals. But using the Core over TB3 was a different story. The TB3 HDK did not have any peripheral ports. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.