17" Clevo P870DM-G + [email protected] (PE4C V4.1) + Win10 [bloodhawk]

bloodhawk · October 5, 2016

Below we see a 32Gbps-NGFF.M2 and a 32Gbps-TB3 eGPU implementation. The NGFF.M2 interface directly attaches the video card to the southbridge giving better performance at the cost of inconvenient under keyboard access. The Thunderbolt controller adding dragging latency but provided a convenient single cable solution using the external TB3 port.

Notebook

17" Clevo P870DM-G

Intel Z170 chipset with 8Gbps (Gen3) Southbridge PCIe ports

i7-6700K CPU

64GB RAM
512GB PCIe SSD

GTX980 dGPU

Win10

eGPU Gear

PE4C V4.1 - NGFF.M2

InXtron HDK - TB3

NVidia GTX1080

EVGA 600B PSU

Hardware pictures (InXtron-HDK-TB3/PE4C V4.1-NGFF.M2)

Spoiler

http://imgur.com/a/BtVwJ

Spoiler

Final Setup (for now) :

InXtron HDK @32Gbps-TB3

TB3 Benchmark results

http://www.3dmark.com/3dm/15193649 -Standard, GPU = 18536

http://www.3dmark.com/3dm/15191071 - Ultra, GPU = 10007

http://www.3dmark.com/3dm/15193921 - Extreme, GPU = 5206

http://www.3dmark.com/3dm/15193762 - TimeSpy, GPU = 6855

Spoiler

PE4C V4.1 - @32Gbps-NGFF.M2

CUDA-Z :

Spoiler

CUDA-Z Report

=============

Version: 0.10.251 64 bit http://cuda-z.sf.net/

OS Version: Windows x86 6.2.9200

Driver Version: 372.70

Driver Dll Version: 8.0 (6.14.13.7270)

Runtime Dll Version: 6.50

Core Information

----------------

Name: GeForce GTX 1080

Compute Capability: 6.1

Clock Rate: 1809.5 MHz

PCI Location: 0:5:0

Multiprocessors: 20

Threads Per Multiproc.: 2048

Warp Size: 32

Regs Per Block: 65536

Threads Per Block: 1024

Threads Dimensions: 1024 x 1024 x 64

Grid Dimensions: 2147483647 x 65535 x 65535

Watchdog Enabled: Yes

Integrated GPU: No

Concurrent Kernels: Yes

Compute Mode: Default

Stream Priorities: Yes

Memory Information

------------------

Total Global: 8192 MiB

Bus Width: 256 bits

Clock Rate: 5005 MHz

Error Correction: No

L2 Cache Size: 48 KiB

Shared Per Block: 48 KiB

Pitch: 2048 MiB

Total Constant: 64 KiB

Texture Alignment: 512 B

Texture 1D Size: 131072

Texture 2D Size: 131072 x 65536

Texture 3D Size: 16384 x 16384 x 16384

GPU Overlap: Yes

Map Host Memory: Yes

Unified Addressing: Yes

Async Engine: Yes, Bidirectional

Performance Information

-----------------------

Memory Copy

Host Pinned to Device: 2842.01 MiB/s

Host Pageable to Device: 2414.87 MiB/s

Device to Host Pinned: 2981.8 MiB/s

Device to Host Pageable: 2559.01 MiB/s

Device to Device: 111.991 GiB/s

GPU Core Performance

Single-precision Float: 9246.7 Gflop/s

Double-precision Float: 295.548 Gflop/s

64-bit Integer: 465.641 Giop/s

32-bit Integer: 2742.09 Giop/s

24-bit Integer: 2084.85 Giop/s

Generated: Tue Oct 04 23:08:26 2016

32Gbps-NGFF.M2 benchmark results

http://www.3dmark.com/fs/10425255 - Standard, GPU = 23143
http://www.3dmark.com/3dm/15359522 - Extreme, GPU = 10938
http://www.3dmark.com/3dm/15250559 - Ultra, GPU = 5451
http://www.3dmark.com/fs/10398914 - Ultra, GPU = 5635
http://www.3dmark.com/3dm/15360121 - TimeSpy, GPU = 7672

Spoiler

Edited October 26, 2016 by bloodhawk
formatting and added intro

Tech Inferno Fan · October 5, 2016

27 minutes ago, bloodhawk said:

I can confirm the link speeds over the M.2 ports on the Z170 chip-set. (P870DM-G) using a PE4C v4.1 .

Will see if i can check thing with my friends GT72VR (or something)

The hwinfo64 screenshots confirm the M.2 eGPU port hosting your GTX1080 is running at 8Gbps. It doesn't give the link width. I assume it would be x1 3.0 but GPU-Z would tell us exactly.

bloodhawk · October 5, 2016

25 minutes ago, Tech Inferno Fan said:

The hwinfo64 screenshots confirm the M.2 eGPU port hosting your GTX1080 is running at 8Gbps. It doesn't give the link width. I assume it would be x1 3.0 but GPU-Z would tell us exactly.

Yep. in GPUz and NvInspector the link does show up as PCIe 3.0 x4 . Im not sure how much accurate that is, specially since the actual limit is 8GT/s 8Gbps).

Tech Inferno Fan · October 5, 2016

39 minutes ago, bloodhawk said:

Yep. in GPUz and NvInspector the link does show up as PCIe 3.0 x4 . Im not sure how much accurate that is, specially since the actual limit is 8GT/s 8Gbps).

Pls run CUDA-Z on your eGPU and post the result. We know from https://www.techinferno.com/index.php?/forums/topic/5226-2013-15-macbook-pro-gt750m-gtx780ti16gbps-tb2-sonnet-iii-d-win81-squinks/

CUDA-Z Host-to-Device Bandwidth

TB2 -16Gbps: 1258 MiB/s

TB1-10Gbps: 781MiB/s link

TB1-8Gbps (x2 2.0): 697MiB/s link

EC2-4Gbps (x1 2.0): 373MiB/s link

We'll quickly be able to establish it's bandwidth using these reference Thunderbolt results.

bloodhawk · October 5, 2016

33 minutes ago, Tech Inferno Fan said:

Pls run CUDA-Z on your eGPU and post the result. We know from

CUDA-Z Host-to-Device Bandwidth

TB2 -16Gbps: 1258 MiB/s

TB1-10Gbps: 781MiB/s link

TB1-8Gbps (x2 2.0): 697MiB/s link

EC2-4Gbps (x1 2.0): 373MiB/s link

We'll quickly be able to establish it's bandwidth using these reference Thunderbolt results.

Here you go good sir -

Ill try to do another test over TB3 tomorrow. This is with quite a few USB devices plugged in and the dGPU @ full link speeds.

Quote

CUDA-Z Report
=============
Version: 0.10.251 64 bit http://cuda-z.sf.net/
OS Version: Windows x86 6.2.9200
Driver Version: 372.70
Driver Dll Version: 8.0 (6.14.13.7270)
Runtime Dll Version: 6.50

Core Information
----------------
   Name: GeForce GTX 1080
   Compute Capability: 6.1
   Clock Rate: 1809.5 MHz
   PCI Location: 0:5:0
   Multiprocessors: 20
   Threads Per Multiproc.: 2048
   Warp Size: 32
   Regs Per Block: 65536
   Threads Per Block: 1024
   Threads Dimensions: 1024 x 1024 x 64
   Grid Dimensions: 2147483647 x 65535 x 65535
   Watchdog Enabled: Yes
   Integrated GPU: No
   Concurrent Kernels: Yes
   Compute Mode: Default
   Stream Priorities: Yes

Memory Information
------------------
   Total Global: 8192 MiB
   Bus Width: 256 bits
   Clock Rate: 5005 MHz
   Error Correction: No
   L2 Cache Size: 48 KiB
   Shared Per Block: 48 KiB
   Pitch: 2048 MiB
   Total Constant: 64 KiB
   Texture Alignment: 512 B
   Texture 1D Size: 131072
   Texture 2D Size: 131072 x 65536
   Texture 3D Size: 16384 x 16384 x 16384
   GPU Overlap: Yes
   Map Host Memory: Yes
   Unified Addressing: Yes
   Async Engine: Yes, Bidirectional

Performance Information
-----------------------
Memory Copy
   Host Pinned to Device: 2842.01 MiB/s
   Host Pageable to Device: 2414.87 MiB/s
   Device to Host Pinned: 2981.8 MiB/s
   Device to Host Pageable: 2559.01 MiB/s
   Device to Device: 111.991 GiB/s
GPU Core Performance
   Single-precision Float: 9246.7 Gflop/s
   Double-precision Float: 295.548 Gflop/s
   64-bit Integer: 465.641 Giop/s
   32-bit Integer: 2742.09 Giop/s
   24-bit Integer: 2084.85 Giop/s

Generated: Tue Oct 04 23:08:26 2016

Tech Inferno Fan · October 5, 2016

10 minutes ago, bloodhawk said:

Here you go good sir -

Ill try to do another test over TB3 tomorrow. This is with quite a few USB devices plugged in and the dGPU @ full link speeds.

You have "Host Pinned to Device: 2842.01 MiB/s" which would mean it's a x4 3.0 (32Gbps) link as reported by GPU-Z. That would be faster than TB3 since there is no additional TB3 controller latency to deal with. Rather, it's a direct electrical link to the Intel southbridge.

bloodhawk · October 5, 2016

Just now, Tech Inferno Fan said:

You have "Host Pinned to Device: 2842.01 MiB/s" which would mean it's a x4 3.0 (32Gbps) link as reported by CUDA-Z. That would be faster than TB3 since there is no additional TB3 controller latency to deal with. Rather, it's a direct electrical link to the Intel southbridge.

Gotcha. Figured as much specially after the jump in scores compared to TB3.

Im adding content to a thread over at the other place, that shall not be named. Will update the other thread soon after with the benchmarks.

Can you please link me to an existing/central thread where members talk about tweaking such setups ?

Tech Inferno Fan · October 5, 2016

@bloodhawk, I moved the discussion content here since it was referencing your implementation, plus I've summarized your hardware as I know it in the opening post. Pictures of the actual hardware and how the PE4C V4.1 connects would bring it to life. A unique eGPU implementation that's for sure.

bloodhawk · October 5, 2016

1 hour ago, Tech Inferno Fan said:

@bloodhawk, I moved the discussion content here since it was referencing your implementation, plus I've summarized your hardware as I know it in the opening post. Pictures of the actual hardware and how the PE4C V4.1 connects would bring it to life. A unique eGPU implementation that's for sure.

Yeap, working on that. I dont have a great camera at hand right now. But the One M8 will have to do for now

Thank you for moving the posts. Was about to PM you about the same.

Edited October 5, 2016 by bloodhawk

Tech Inferno Fan · October 5, 2016

11 minutes ago, bloodhawk said:

Yeap, working on that. I dont have a great camera at hand right now. But that the One M8 will have to do for now

Thank you for moving the posts. Was about to PM you about the same.

Ok.. added a EC2-4Gbps reference to the discussion. EC2 is also directly wired to the Southbridge. if x1 2.0 = 373MiB/s, then multiple by 2 to get x1 3.0, then multiple by 4 to get x4 3.0. The result is 2984MiB/s, which is onpar with your 2842Mi/s. So definitely running x4 3.0.

EC2-4Gbps (x1 2.0): 373MiB/s link

bloodhawk · October 5, 2016

22 minutes ago, Tech Inferno Fan said:

Ok.. added a EC2-4Gbps reference to the discussion. EC2 is also directly wired to the Southbridge. if x1 2.0 = 373MiB/s, then multiple by 2 to get x1 3.0, then multiple by 4 to get x4 3.0. The result is 2984MiB/s, which is onpar with your 2842Mi/s. So definitely running x4 3.0.

EC2-4Gbps (x1 2.0): 373MiB/s link

That makes sense. The Device to Host actually hovers around 29XX MiB/s when there are not too many tabs in Chrome or im not doing much.

Next step is to figure out if its possible to create a powered extension cable/connector.

These speeds made me wonder, if the Proprietary AGA connector is just a fancy looking M.2 port extension. Id actually be willing to buy one and tear it down if someone can get me the service manual with the pin outs.

Edited October 5, 2016 by bloodhawk

October 12, 2016

On 10/4/2016 at 11:53 PM, bloodhawk said:

That makes sense. The Device to Host actually hovers around 29XX MiB/s when there are not too many tabs in Chrome or im not doing much.

Next step is to figure out if its possible to create a powered extension cable/connector.

These speeds made me wonder, if the Proprietary AGA connector is just a fancy looking M.2 port extension. Id actually be willing to buy one and tear it down if someone can get me the service manual with the pin outs.

It would be nice if you could put Alienware's eGPU to better use an an immensely more powerful machine like your P870DM.

bloodhawk · October 12, 2016

40 minutes ago, Mr. Fox said:

It would be nice if you could put Alienware's eGPU to better use an an immensely more powerful machine like your P870DM.

Yeah I'm looking for a service manual right now. If I don't find one, I'll just order a good multimeter and an AGA in a week or 2 to test.

But I'm still looking for the female end of the connector that is preset on the system end of the AW's.

October 12, 2016

Maybe you can find a dead parts-old Alienware laptop to steal the port off of it. If you can source one you might be able to get something from Mouser to make your own connector.

bloodhawk · October 13, 2016

4 hours ago, Mr. Fox said:

Maybe you can find a dead parts-old Alienware laptop to steal the port off of it. If you can source one you might be able to get something from Mouser to make your own connector.

Gotcha. Will look around. The female port isn't all that much of a big deal, can always rig something up with pins.

Edited October 13, 2016 by bloodhawk

Tech Inferno Fan · October 31, 2016

@bloodhawk, you wouldn't still have the TB3 enclosure? We need some CUDA-Z output to confirm if it's providing 20Gbps or 32Gbps. Without CUDA-Z your noticably faster 3dmark result using NGFF.M2 compared to TB3 favoring would suggest your TB3 interface was 20Gbps.

See the discussion RE: TB3 bandwidth at https://www.techinferno.com/index.php?/forums/topic/10718-2016-macbook-pros-and-egpus/&do=findComment&comment=151839

bloodhawk · October 31, 2016

4 hours ago, Tech Inferno Fan said:

@bloodhawk, you wouldn't still have the TB3 enclosure? We need some CUDA-Z output to confirm if it's providing 20Gbps or 32Gbps. Without CUDA-Z your noticably faster 3dmark result using NGFF.M2 compared to TB3 favoring would suggest your TB3 interface was 20Gbps.

See the discussion RE: TB3 bandwidth at

I still do have it. Ill try to install it tonight and test it out.

Edited October 31, 2016 by bloodhawk

bloodhawk · November 1, 2016

@Tech Inferno Fan

Here it is :

Tech Inferno Fan · November 1, 2016

7 minutes ago, bloodhawk said:

@Tech Inferno Fan

Here it is :

Great. Looking lower but not x4 2.0 levels. Are you using an active TB3 cable?

Under the same load conditions, can you re-run CUDA-Z on the NGFF.M2 32Gbps interface?

bloodhawk · November 1, 2016

1 minute ago, Tech Inferno Fan said:

Great. Looking lower but not x4 2.0 levels. Under the same load conditions, can you re-run CUDA-Z on the NGFF.M2 32Gbps interface?

I actually did last night, it was the same as posted in the OP, +/- 75-100. Even this hit around 2200 MiB/s.

Ill post a screen once i connect it over M.2 in about 20 mins.

bloodhawk · November 1, 2016

@Tech Inferno Fan This is with M.2 NGFF

Output :

Spoiler

CUDA-Z Report
=============
Version: 0.10.251 64 bit http://cuda-z.sf.net/
OS Version: Windows x86 6.2.9200
Driver Version: 375.70
Driver Dll Version: 8.0 (6.14.13.7570)
Runtime Dll Version: 6.50

Core Information
----------------
   Name: GeForce GTX 1080
   Compute Capability: 6.1
   Clock Rate: 1733.5 MHz
   PCI Location: 0:62:0
   Multiprocessors: 20
   Threads Per Multiproc.: 2048
   Warp Size: 32
   Regs Per Block: 65536
   Threads Per Block: 1024
   Threads Dimensions: 1024 x 1024 x 64
   Grid Dimensions: 2147483647 x 65535 x 65535
   Watchdog Enabled: Yes
   Integrated GPU: No
   Concurrent Kernels: Yes
   Compute Mode: Default
   Stream Priorities: Yes

Memory Information
------------------
   Total Global: 8192 MiB
   Bus Width: 256 bits
   Clock Rate: 5005 MHz
   Error Correction: No
   L2 Cache Size: 48 KiB
   Shared Per Block: 48 KiB
   Pitch: 2048 MiB
   Total Constant: 64 KiB
   Texture Alignment: 512 B
   Texture 1D Size: 131072
   Texture 2D Size: 131072 x 65536
   Texture 3D Size: 16384 x 16384 x 16384
   GPU Overlap: Yes
   Map Host Memory: Yes
   Unified Addressing: Yes
   Async Engine: Yes, Bidirectional

Performance Information
-----------------------
Memory Copy
   Host Pinned to Device: 2784.71 MiB/s
   Host Pageable to Device: 2371.13 MiB/s
   Device to Host Pinned: 2541.68 MiB/s
   Device to Host Pageable: 2224.87 MiB/s
   Device to Device: 118.4 GiB/s
GPU Core Performance
   Single-precision Float: 7892.93 Gflop/s
   Double-precision Float: 259.88 Gflop/s
   64-bit Integer: 398.933 Giop/s
   32-bit Integer: 2363.25 Giop/s
   24-bit Integer: 1794.89 Giop/s

Generated: Mon Oct 31 22:28:55 2016

Tech Inferno Fan · November 1, 2016

@bloodhawk, the NGFF.M2 GPU-Z output has faster fillrate, bandwidth, GPU clocks. Can you do like-for-like comparisons?

In addition, pls ensure the machine has the battery installed and is using a high-performance power profile.

You may consider also installing Throttlestop and disabling C1E, EIST, C6/C7 which (at least a couple of generations ago), would affect SATA SSD performance. That may help bring the NGFF.M2 and TB3 CUDA-Z bandwith results closer.

bloodhawk · November 1, 2016

26 minutes ago, Tech Inferno Fan said:

@bloodhawk, the NGFF.M2 GPU-Z output has faster fillrate, bandwidth, GPU clocks. Can you do like-for-like comparisons?

In addition, pls ensure the machine has the battery installed and is using a high-performance power profile.

You may consider also installing Throttlestop and disabling C1E, EIST, C6/C7 which (at least a couple of generations ago), would affect SATA SSD performance. That may help bring the NGFF.M2 and TB3 CUDA-Z bandwith results closer.

Doing another test over TB3 will take a few days unfortunately. I have my OS NVME SSD connected to the system over TB3 right now and renders are running.

But the clocks were exactly the same in both tests with completely bone stock drivers. The machine has the battery installed (always) and is using the High Performance Profile (Always). ( i never run the system without or on the battery , its always connected to AC and is on the High Performance profile with C-States disabled on the processor).

I did try higher clocks when the GPU was connected over TB3 (2088 Mhz), but CUDAZ output was exactly the same.

Also im using the stock 50cm 40Gbps TB3 cable. Not really going to invest in another cable anytime soon, since i dont really need it. To be honest, i haven't even found one for sale online.

Edited November 1, 2016 by bloodhawk

Splitframe · November 21, 2016

On 10/5/2016 at 6:05 AM, bloodhawk said:

TB3 Benchmark results

http://www.3dmark.com/3dm/15193649 -Standard, GPU = 18536

http://www.3dmark.com/3dm/15191071 - Ultra, GPU = 10007

http://www.3dmark.com/3dm/15193921 - Extreme, GPU = 5206

http://www.3dmark.com/3dm/15193762 - TimeSpy, GPU = 6855

(You mixed up ultra and extreme here)

These number seem confirm that with increasing FPS the total efficiency over TB3 goes down.

A normal GTX 1080 has scores around these:

Normal - 21905

Extreme - 10293

Ultra - 5020

Source: http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-1080-review,28.html
Note that these values are form a non overclocked reference card, your
ROG scored higher over x4.3 than a non-OC over x16.3.

It seems like over direct PCIe x4.3 there is not much of a loss so I will
take your x4.3 values from above for the calculation.

( 23143 / 10938 / 5451 )

When we look at TB3 compared to x4.3 there is a drop off
of 20% at normal, 8.5% at extreme and 4.5% at ultra.
The benchmark runs at around 100/60/10 FPS for normal/extreme/ultra respectively.

My guess based on these values and experiences from the old x1.2/x1.3 times is
that the performance hit over TB3/unsufficient bandwidth in general increases with higher FPS
and everyone who looks at 144Hz gaming over TB3 should think twice.

The 60 FPS mark seems in a good spot for me personally, the 8.5% hit is bearable for me.

I wonder if or how this is in normal games and not synthetic benchmarks though

and how much peripherals take away in addition.

Edited November 21, 2016 by Splitframe

bloodhawk · November 21, 2016

2 minutes ago, Splitframe said:

(You mixed up ultra and extreme here)

These number seem confirm that with increasing FPS the total efficiency over TB3 goes down.

A normal GTX 1080 has scores around these:

Normal - 21905

Extreme - 10293

Ultra - 5020

Source: http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-1080-review,28.html
Note that these values are form a non overclocked reference card.

It seems like over direct PCIe x4.3 there is not much of a loss so I will
take your x4.3 values from above for the calculation.

( 23143 / 10938 / 5451 )

When we look at TB3 compared to x4.3 there is a drop off
of 20% at normal, 8.5% at extreme and 4.5% at ultra.
The benchmark runs at around 100/60/10 FPS for normal/extreme/ultra respectively.

My guess based on these values and experiences from the old x1.2/x1.3 times is
that the performance hit over TB3/unsufficient bandwidth in general increases with higher FPS
and everyone who looks at 144Hz gaming over TB3 should think twice.

The 60 FPS mark seems in a good spot for me personally, the 8.5% hit is bearable for me.

I wonder if or how this is in normal games and not synthetic benchmarks though

and how much peripherals take away in addition.

Definitely agree.

Are you referring to peripherals taking away from TB3 bandwidth or PCIe x4?

At least on my system in either case , I didn't have any issues with peripherals.

But using the Core over TB3 was a different story. The TB3 HDK did not have any peripheral ports.

Sign In

17" Clevo P870DM-G + [email protected] (PE4C V4.1) + Win10 [bloodhawk]

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information