Jump to content
bloodhawk

17" Clevo P870DM-G + GTX1080@32Gbps-NGFF.M2 (PE4C V4.1) + Win10 [bloodhawk]

Recommended Posts

Below we see a 32Gbps-NGFF.M2 and a 32Gbps-TB3 eGPU implementation.  The NGFF.M2 interface directly attaches the video card to the southbridge giving better performance at the cost of inconvenient under keyboard access.  The Thunderbolt controller adding dragging latency  but provided a convenient single cable solution using the external TB3 port.

 

Notebook

17"  Clevo P870DM-G

Intel Z170 chipset with 8Gbps (Gen3) Southbridge PCIe ports

i7-6700K CPU

64GB RAM
512GB PCIe SSD

GTX980 dGPU

Win10

 

eGPU Gear

PE4C V4.1 - NGFF.M2

InXtron HDK - TB3

NVidia GTX1080

EVGA 600B PSU

 

Hardware pictures (InXtron-HDK-TB3/PE4C V4.1-NGFF.M2)


 

 

 

Spoiler
bJr5XMi.jpg
N3LkneO.jpg
TQkb3aO.jpg

 

Final Setup (for now) : 

4yvhtOL.jpg

 

 
InXtron HDK @32Gbps-TB3
 
74f28519c5fca2da3a4c9981e50c35d3.png 898bdde6ad783c255bc513b55151be3a.png
 
TB3 Benchmark results
 
http://www.3dmark.com/3dm/15193649 -Standard, GPU = 18536
http://www.3dmark.com/3dm/15191071 - Ultra, GPU = 10007
http://www.3dmark.com/3dm/15193921 - Extreme, GPU = 5206

http://www.3dmark.com/3dm/15193762 - TimeSpy, GPU = 6855

 

Spoiler

 

 
9423d6bbfb2a6b59f40b221846c1b4de.png
6879bbfc2b2d1a24398e0e28e3d0e280.png
 
 
69c13f3709e8defaa65e19b1324076e3.png
1b93c2220ff0951367fadeaff9aaf52c.jpg
7a839ef4e8cc2a5ecd039ae297c842aa.png

 

 

PE4C V4.1 - @32Gbps-NGFF.M2

 

 
94b64c8d49f08edb00891d57258122f0.png
 
CUDA-Z  :
Spoiler

 

CUDA-Z Report
=============
Version: 0.10.251 64 bit http://cuda-z.sf.net/
OS Version: Windows x86 6.2.9200
Driver Version: 372.70
Driver Dll Version: 8.0 (6.14.13.7270)
Runtime Dll Version: 6.50
 
Core Information
----------------
    Name: GeForce GTX 1080
    Compute Capability: 6.1
    Clock Rate: 1809.5 MHz
    PCI Location: 0:5:0
    Multiprocessors: 20
    Threads Per Multiproc.: 2048
    Warp Size: 32
    Regs Per Block: 65536
    Threads Per Block: 1024
    Threads Dimensions: 1024 x 1024 x 64
    Grid Dimensions: 2147483647 x 65535 x 65535
    Watchdog Enabled: Yes
    Integrated GPU: No
    Concurrent Kernels: Yes
    Compute Mode: Default
    Stream Priorities: Yes
 
Memory Information
------------------
    Total Global: 8192 MiB
    Bus Width: 256 bits
    Clock Rate: 5005 MHz
    Error Correction: No
    L2 Cache Size: 48 KiB
    Shared Per Block: 48 KiB
    Pitch: 2048 MiB
    Total Constant: 64 KiB
    Texture Alignment: 512 B
    Texture 1D Size: 131072
    Texture 2D Size: 131072 x 65536
    Texture 3D Size: 16384 x 16384 x 16384
    GPU Overlap: Yes
    Map Host Memory: Yes
    Unified Addressing: Yes
    Async Engine: Yes, Bidirectional
 
Performance Information
-----------------------
Memory Copy
    Host Pinned to Device: 2842.01 MiB/s
    Host Pageable to Device: 2414.87 MiB/s
    Device to Host Pinned: 2981.8 MiB/s
    Device to Host Pageable: 2559.01 MiB/s
    Device to Device: 111.991 GiB/s
GPU Core Performance
    Single-precision Float: 9246.7 Gflop/s
    Double-precision Float: 295.548 Gflop/s
    64-bit Integer: 465.641 Giop/s
    32-bit Integer: 2742.09 Giop/s
    24-bit Integer: 2084.85 Giop/s
 
Generated: Tue Oct 04 23:08:26 2016

 

32Gbps-NGFF.M2 benchmark results

 

http://www.3dmark.com/fs/10425255 - Standard, GPU = 23143
http://www.3dmark.com/3dm/15359522 - Extreme, GPU = 10938
http://www.3dmark.com/3dm/15250559 - Ultra, GPU = 5451
http://www.3dmark.com/fs/10398914 - Ultra, GPU = 5635
http://www.3dmark.com/3dm/15360121 - TimeSpy, GPU = 7672

 
Spoiler

 

fad8dccdbc35f9dd5c99235bc7ed5471.png
[IMG][IMG][IMG][IMG]
6ae2d5fb3a68c4604c805398d5217afd.png

 

 

Edited by bloodhawk
formatting and added intro
  • Thumbs Up 3

Share this post


Link to post
Share on other sites
27 minutes ago, bloodhawk said:

I can confirm the link speeds over the M.2 ports on the Z170 chip-set. (P870DM-G) using a PE4C v4.1 . 

Will see if i can check thing with my friends GT72VR (or something)

 

The hwinfo64 screenshots confirm the M.2 eGPU port hosting your GTX1080 is running at 8Gbps. It doesn't give the link width. I assume it would be x1 3.0 but GPU-Z would tell us exactly.

  • Thumbs Up 2

Share this post


Link to post
Share on other sites
25 minutes ago, Tech Inferno Fan said:

The hwinfo64 screenshots confirm the M.2 eGPU port hosting your GTX1080 is running at 8Gbps. It doesn't give the link width. I assume it would be x1 3.0 but GPU-Z would tell us exactly.

Yep. in GPUz and NvInspector the link does show up as PCIe 3.0 x4 . Im not sure how much accurate that is, specially since the actual limit is 8GT/s 8Gbps).

 

Share this post


Link to post
Share on other sites
39 minutes ago, bloodhawk said:

Yep. in GPUz and NvInspector the link does show up as PCIe 3.0 x4 . Im not sure how much accurate that is, specially since the actual limit is 8GT/s 8Gbps).

 

 

Pls run CUDA-Z on your eGPU and post the result. We know from https://www.techinferno.com/index.php?/forums/topic/5226-2013-15-macbook-pro-gt750m-gtx780ti16gbps-tb2-sonnet-iii-d-win81-squinks/
 

CUDA-Z Host-to-Device Bandwidth

 

TB2 -16Gbps: 1258 MiB/s

TB1-10Gbps: 781MiB/s link

TB1-8Gbps (x2 2.0): 697MiB/s link

EC2-4Gbps (x1 2.0): 373MiB/s link

 

We'll quickly be able to establish it's bandwidth using these reference Thunderbolt results.

  • Thumbs Up 1

Share this post


Link to post
Share on other sites
33 minutes ago, Tech Inferno Fan said:

 

Pls run CUDA-Z on your eGPU and post the result. We know from

CUDA-Z Host-to-Device Bandwidth

 

TB2 -16Gbps: 1258 MiB/s

TB1-10Gbps: 781MiB/s link

TB1-8Gbps (x2 2.0): 697MiB/s link

EC2-4Gbps (x1 2.0): 373MiB/s link

 

We'll quickly be able to establish it's bandwidth using these reference Thunderbolt results.

 

Here you go good sir - 

 

Ill try to do another test over TB3 tomorrow. This is with quite a few USB devices plugged in and the dGPU @ full link speeds.

Quote

 

CUDA-Z Report
=============
Version: 0.10.251 64 bit http://cuda-z.sf.net/
OS Version: Windows x86 6.2.9200 
Driver Version: 372.70
Driver Dll Version: 8.0 (6.14.13.7270)
Runtime Dll Version: 6.50

Core Information
----------------
    Name: GeForce GTX 1080
    Compute Capability: 6.1
    Clock Rate: 1809.5 MHz
    PCI Location: 0:5:0
    Multiprocessors: 20
    Threads Per Multiproc.: 2048
    Warp Size: 32
    Regs Per Block: 65536
    Threads Per Block: 1024
    Threads Dimensions: 1024 x 1024 x 64
    Grid Dimensions: 2147483647 x 65535 x 65535
    Watchdog Enabled: Yes
    Integrated GPU: No
    Concurrent Kernels: Yes
    Compute Mode: Default
    Stream Priorities: Yes

Memory Information
------------------
    Total Global: 8192 MiB
    Bus Width: 256 bits
    Clock Rate: 5005 MHz
    Error Correction: No
    L2 Cache Size: 48 KiB
    Shared Per Block: 48 KiB
    Pitch: 2048 MiB
    Total Constant: 64 KiB
    Texture Alignment: 512 B
    Texture 1D Size: 131072
    Texture 2D Size: 131072 x 65536
    Texture 3D Size: 16384 x 16384 x 16384
    GPU Overlap: Yes
    Map Host Memory: Yes
    Unified Addressing: Yes
    Async Engine: Yes, Bidirectional

Performance Information
-----------------------
Memory Copy
    Host Pinned to Device: 2842.01 MiB/s
    Host Pageable to Device: 2414.87 MiB/s
    Device to Host Pinned: 2981.8 MiB/s
    Device to Host Pageable: 2559.01 MiB/s
    Device to Device: 111.991 GiB/s
GPU Core Performance
    Single-precision Float: 9246.7 Gflop/s
    Double-precision Float: 295.548 Gflop/s
    64-bit Integer: 465.641 Giop/s
    32-bit Integer: 2742.09 Giop/s
    24-bit Integer: 2084.85 Giop/s

Generated: Tue Oct 04 23:08:26 2016

 

 

Share this post


Link to post
Share on other sites
10 minutes ago, bloodhawk said:

 

Here you go good sir - 

 

Ill try to do another test over TB3 tomorrow. This is with quite a few USB devices plugged in and the dGPU @ full link speeds.

 

 

You have "Host Pinned to Device: 2842.01 MiB/s" which would mean it's a x4 3.0 (32Gbps) link as reported by GPU-Z. That would be faster than TB3 since there is no additional TB3 controller latency to deal with. Rather, it's a direct electrical link to the Intel southbridge.

  • Thumbs Up 2

Share this post


Link to post
Share on other sites
Just now, Tech Inferno Fan said:

 

You have "Host Pinned to Device: 2842.01 MiB/s" which would mean it's a x4 3.0 (32Gbps) link as reported by CUDA-Z. That would be faster than TB3 since there is no additional TB3 controller latency to deal with. Rather, it's a direct electrical link to the Intel southbridge.

Gotcha. Figured as much specially after the jump in scores compared to TB3. 

Im adding content to a thread over at the other place, that shall not be named. Will update the other thread soon after with the benchmarks.

 

Can you please link me to an existing/central thread where members talk about tweaking such setups ?

Share this post


Link to post
Share on other sites

@bloodhawk, I moved the discussion content here since it was referencing your implementation, plus I've summarized your hardware as I know it in the opening post. Pictures of the actual hardware and how the PE4C V4.1 connects would bring it to life. A unique eGPU implementation that's for sure.

  • Thumbs Up 2

Share this post


Link to post
Share on other sites
1 hour ago, Tech Inferno Fan said:

@bloodhawk, I moved the discussion content here since it was referencing your implementation, plus I've summarized your hardware as I know it in the opening post. Pictures of the actual hardware and how the PE4C V4.1 connects would bring it to life. A unique eGPU implementation that's for sure.

 

Yeap, working on that. I dont have a great camera at hand right now. But the One M8 will have to do for now :P

 

Thank you for moving the posts. Was about to PM you about the same.

Edited by bloodhawk
  • Thumbs Up 1

Share this post


Link to post
Share on other sites
11 minutes ago, bloodhawk said:

 

Yeap, working on that. I dont have a great camera at hand right now. But that the One M8 will have to do for now :P

 

Thank you for moving the posts. Was about to PM you about the same.

 

Ok.. added a EC2-4Gbps reference to the discussion. EC2 is also directly wired to the Southbridge. if x1 2.0 = 373MiB/s, then multiple by 2 to get x1 3.0, then multiple by 4 to get x4 3.0. The result is 2984MiB/s, which is onpar with your 2842Mi/s. So definitely running x4 3.0.

 

EC2-4Gbps (x1 2.0): 373MiB/s link

Share this post


Link to post
Share on other sites
22 minutes ago, Tech Inferno Fan said:

 

Ok.. added a EC2-4Gbps reference to the discussion. EC2 is also directly wired to the Southbridge. if x1 2.0 = 373MiB/s, then multiple by 2 to get x1 3.0, then multiple by 4 to get x4 3.0. The result is 2984MiB/s, which is onpar with your 2842Mi/s. So definitely running x4 3.0.

 

EC2-4Gbps (x1 2.0): 373MiB/s link

 

That makes sense. The Device to Host actually hovers around 29XX MiB/s  when there are not too many tabs in Chrome or im not doing much. 

 

Next step is to figure out if its possible to create a powered extension cable/connector.

 

These speeds made me wonder, if the Proprietary AGA connector is just a fancy looking M.2 port extension. Id actually be willing to buy one and tear it down if someone can get me the service manual with the pin outs.

Edited by bloodhawk
  • Thumbs Up 2

Share this post


Link to post
Share on other sites
On 10/4/2016 at 11:53 PM, bloodhawk said:

 

That makes sense. The Device to Host actually hovers around 29XX MiB/s  when there are not too many tabs in Chrome or im not doing much. 

 

Next step is to figure out if its possible to create a powered extension cable/connector.

 

These speeds made me wonder, if the Proprietary AGA connector is just a fancy looking M.2 port extension. Id actually be willing to buy one and tear it down if someone can get me the service manual with the pin outs.

It would be nice if you could put Alienware's eGPU to better use an an immensely more powerful machine like your P870DM. 

  • Thumbs Up 1

Share this post


Link to post
Share on other sites
40 minutes ago, Mr. Fox said:

It would be nice if you could put Alienware's eGPU to better use an an immensely more powerful machine like your P870DM. 

 

Yeah I'm looking for a service manual right now. If I don't find one, I'll just order a good multimeter and an AGA in a week or 2 to test. 

But I'm still looking for the female end of the connector that is preset on the system end of the AW's.

Share this post


Link to post
Share on other sites

Maybe you can find a dead parts-old Alienware laptop to steal the port off of it. If you can source one you might be able to get something from Mouser to make your own connector. 

Share this post


Link to post
Share on other sites
4 hours ago, Mr. Fox said:

Maybe you can find a dead parts-old Alienware laptop to steal the port off of it. If you can source one you might be able to get something from Mouser to make your own connector. 

Gotcha. Will look around. The female port isn't all that much of a big deal, can always rig something up with pins. 

 

Edited by bloodhawk
  • Thumbs Up 1

Share this post


Link to post
Share on other sites

@bloodhawk, you wouldn't still have the TB3 enclosure? We need some CUDA-Z output to confirm if it's providing 20Gbps or 32Gbps. Without CUDA-Z your noticably faster 3dmark result using NGFF.M2 compared to TB3 favoring would suggest your TB3 interface was 20Gbps.

 

See the discussion RE: TB3 bandwidth at https://www.techinferno.com/index.php?/forums/topic/10718-2016-macbook-pros-and-egpus/&do=findComment&comment=151839

Share this post


Link to post
Share on other sites
4 hours ago, Tech Inferno Fan said:

@bloodhawk, you wouldn't still have the TB3 enclosure? We need some CUDA-Z output to confirm if it's providing 20Gbps or 32Gbps. Without CUDA-Z your noticably faster 3dmark result using NGFF.M2 compared to TB3 favoring would suggest your TB3 interface was 20Gbps.

 

See the discussion RE: TB3 bandwidth at

 

 

I still do have it. Ill try to install it tonight and test it out.

 

 

Edited by bloodhawk

Share this post


Link to post
Share on other sites
1 minute ago, Tech Inferno Fan said:

 

Great. Looking lower but not x4 2.0 levels. Under the same load conditions, can you re-run CUDA-Z on the NGFF.M2 32Gbps interface?

 

I actually did last night, it was the same as posted in the OP, +/- 75-100. Even this hit around 2200 MiB/s.

 

Ill post a screen once i connect it over M.2 in about 20 mins. 

Share this post


Link to post
Share on other sites

@Tech Inferno Fan This is with M.2 NGFF

 

fe7b19504281526927d8ca94db729dfa.png

 

 

Output : 

Spoiler

CUDA-Z Report
=============
Version: 0.10.251 64 bit http://cuda-z.sf.net/
OS Version: Windows x86 6.2.9200 
Driver Version: 375.70
Driver Dll Version: 8.0 (6.14.13.7570)
Runtime Dll Version: 6.50

Core Information
----------------
    Name: GeForce GTX 1080
    Compute Capability: 6.1
    Clock Rate: 1733.5 MHz
    PCI Location: 0:62:0
    Multiprocessors: 20
    Threads Per Multiproc.: 2048
    Warp Size: 32
    Regs Per Block: 65536
    Threads Per Block: 1024
    Threads Dimensions: 1024 x 1024 x 64
    Grid Dimensions: 2147483647 x 65535 x 65535
    Watchdog Enabled: Yes
    Integrated GPU: No
    Concurrent Kernels: Yes
    Compute Mode: Default
    Stream Priorities: Yes

Memory Information
------------------
    Total Global: 8192 MiB
    Bus Width: 256 bits
    Clock Rate: 5005 MHz
    Error Correction: No
    L2 Cache Size: 48 KiB
    Shared Per Block: 48 KiB
    Pitch: 2048 MiB
    Total Constant: 64 KiB
    Texture Alignment: 512 B
    Texture 1D Size: 131072
    Texture 2D Size: 131072 x 65536
    Texture 3D Size: 16384 x 16384 x 16384
    GPU Overlap: Yes
    Map Host Memory: Yes
    Unified Addressing: Yes
    Async Engine: Yes, Bidirectional

Performance Information
-----------------------
Memory Copy
    Host Pinned to Device: 2784.71 MiB/s
    Host Pageable to Device: 2371.13 MiB/s
    Device to Host Pinned: 2541.68 MiB/s
    Device to Host Pageable: 2224.87 MiB/s
    Device to Device: 118.4 GiB/s
GPU Core Performance
    Single-precision Float: 7892.93 Gflop/s
    Double-precision Float: 259.88 Gflop/s
    64-bit Integer: 398.933 Giop/s
    32-bit Integer: 2363.25 Giop/s
    24-bit Integer: 1794.89 Giop/s

Generated: Mon Oct 31 22:28:55 2016

 

Share this post


Link to post
Share on other sites

@bloodhawk, the NGFF.M2 GPU-Z output has faster fillrate, bandwidth, GPU clocks. Can you do like-for-like comparisons?

 

In addition, pls ensure the machine has the battery installed and is using a high-performance power profile.

 

You may consider also installing Throttlestop and disabling C1E, EIST, C6/C7 which (at least a couple of generations ago), would affect SATA SSD performance. That may help bring the NGFF.M2 and TB3 CUDA-Z bandwith results closer.

Share this post


Link to post
Share on other sites
26 minutes ago, Tech Inferno Fan said:

@bloodhawk, the NGFF.M2 GPU-Z output has faster fillrate, bandwidth, GPU clocks. Can you do like-for-like comparisons?

 

In addition, pls ensure the machine has the battery installed and is using a high-performance power profile.

 

You may consider also installing Throttlestop and disabling C1E, EIST, C6/C7 which (at least a couple of generations ago), would affect SATA SSD performance. That may help bring the NGFF.M2 and TB3 CUDA-Z bandwith results closer.

 

Doing another test over TB3 will take a few days unfortunately. I have my OS NVME SSD connected to the system over TB3 right now and renders are running. 

 

But the clocks were exactly the same in both tests with completely bone stock drivers. The machine has the battery installed (always) and is using the High Performance Profile (Always). ( i never run the system without or on the battery , its always connected to AC and is on the High Performance profile with C-States disabled on the processor).

 

I did try higher clocks when the GPU was connected over TB3 (2088 Mhz),  but CUDAZ output was exactly the same. 

 

 

Also im using the stock 50cm 40Gbps TB3 cable. Not really going to invest in another cable anytime soon, since i dont really need it.  To be honest, i haven't even found one for sale online.

Edited by bloodhawk
  • Thumbs Up 1

Share this post


Link to post
Share on other sites
On 10/5/2016 at 6:05 AM, bloodhawk said:
TB3 Benchmark results
 
http://www.3dmark.com/3dm/15193649 -Standard, GPU = 18536
http://www.3dmark.com/3dm/15191071 - Ultra, GPU = 10007
http://www.3dmark.com/3dm/15193921 - Extreme, GPU = 5206

http://www.3dmark.com/3dm/15193762 - TimeSpy, GPU = 6855

 

(You mixed up ultra and extreme here)

These number seem confirm that with increasing FPS the total efficiency over TB3 goes down.

 A normal GTX 1080 has scores around these:

 

Normal - 21905

Extreme - 10293

Ultra - 5020

 

Source: http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-1080-review,28.html
Note that these values are form a non overclocked reference card, your
ROG scored higher over x4.3 than a non-OC over x16.3.

 

It seems like over direct PCIe x4.3 there is not much of a loss so I will
take your x4.3 values from above for the calculation.

23143 / 10938 / 5451 )

 

When we look at TB3 compared to x4.3 there is a drop off
of 20% at normal, 8.5% at extreme and 4.5% at ultra.
The benchmark runs at around 100/60/10 FPS for normal/extreme/ultra respectively.

 

My guess based on these values and experiences from the old x1.2/x1.3 times is
that the performance hit over TB3/unsufficient bandwidth in general increases with higher FPS
and everyone who looks at 144Hz gaming over TB3 should think twice.

The 60 FPS mark seems in a good spot for me personally, the 8.5% hit is bearable for me.

 

I wonder if or how this is in normal games and not synthetic benchmarks though

and how much peripherals take away in addition.

 

Edited by Splitframe

Share this post


Link to post
Share on other sites
2 minutes ago, Splitframe said:

 

(You mixed up ultra and extreme here)

These number seem confirm that with increasing FPS the total efficiency over TB3 goes down.

 A normal GTX 1080 has scores around these:

 

Normal - 21905

Extreme - 10293

Ultra - 5020

 

Source: http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-1080-review,28.html
Note that these values are form a non overclocked reference card.

 

It seems like over direct PCIe x4.3 there is not much of a loss so I will
take your x4.3 values from above for the calculation.

23143 / 10938 / 5451 )

 

When we look at TB3 compared to x4.3 there is a drop off
of 20% at normal, 8.5% at extreme and 4.5% at ultra.
The benchmark runs at around 100/60/10 FPS for normal/extreme/ultra respectively.

 

My guess based on these values and experiences from the old x1.2/x1.3 times is
that the performance hit over TB3/unsufficient bandwidth in general increases with higher FPS
and everyone who looks at 144Hz gaming over TB3 should think twice.

The 60 FPS mark seems in a good spot for me personally, the 8.5% hit is bearable for me.

 

I wonder if or how this is in normal games and not synthetic benchmarks though

and how much peripherals take away in addition.

 

Definitely agree.

 

Are you referring to peripherals taking away from TB3 bandwidth or PCIe x4? 

At least on my system in either case , I didn't have any issues with peripherals. 

But using the Core over TB3 was a different story. The TB3 HDK did not have any peripheral ports. 

  • Thumbs Up 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

By using this site, you agree to our Terms of Use. We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.