Jump to content

[REQ] Raise Maximum Power target -- Quadro K6000 BIOS attached


vacaloca

Recommended Posts

Hello,

Recently got my hands on a Quadro K6000 and was successful in reading/modifying/re-flashing the original BIOS with higher clocks @ P0 state (993, 1006 boost).

However, if I try higher clocks with my compute bound CUDA code, I see that the card throttles down to 601 MHz. I'm assuming that the reason is because I'm very close to hitting the 225W maximum power target (Average 90%+ power targets with this code @ 993 clocks). I tried looking for information on doing this manually, as Kepler BIOS Tweaker does not have that field enabled when I load the K6000 BIOS and because KGB throws an 'unsupported card' error when I tried to use it with the original BIOS -- it would probably work fine otherwise.

Also, if anyone thinks that what I want to accomplish is stupid, please advise. Card has 2 6-pin inputs and running on a PCI-E 3.0 compatible motherboard. If this was a PCI-E 2.0 setup, I'd already be at the max 225W. However, I've seen some references that PCI-E 3.0 can deliver more power over the bus itself. Is this true? Running on an MSI X79A-GD45 (8D) motherboard, and my power supply is more than adequate -- 750W Seasonic: SeaSonic SS-750KM3 750W ATX12V V2.3/EPS 12V V2.91 SLI Ready 80 PLUS GOLD Certified Full Modular Active PFC Power Supply - Newegg.com and typing this post, APC UPS tells me I'm using 124W running one monitor and other misc items, so there's clearly enough room there.

Can anyone point me in the right direction do accomplish raising the power target or provide me a modified version where I can adjust these? I don't need any external software besides nvidia-smi, because I can set a maximum target there since the card is supported to adjust power levels -- just that the maximum is hard-locked to 225W currently.

BIOS is attached: 103A.rom.orig.txt

Link to comment
Share on other sites

This should work for your needs.

Thanks, I sent a laptop fund donation your way.

With this BIOS I can indeed set power targets with nvidia-smi and set clocks accordingly after the fact, basing it on the initial mod. However, even with the power target set to say 275W or 300W, once the card gets to 225W it immediately downclocks down to 601MHz and stays at that value for the duration of the CUDA code execution. Further if left the power target at the default of 350W, the card will behave erratically -- it will report it's running at say 1150MHz, but the power usage reported by nvidia-smi will be only about ~70ish W.

That being said, I think it does make a difference, I was able to run at 1058MHz, which is decent enough, but still wanted to see if I could get performance similar to GTX Titan where I was able to overclock to about 1150 with the same code.

At this point I'm still not sure if it is even safe to try more than 225W, given this card only has 2x6pin connectors, instead of 1x6pin and 1x8pin like the GTX Titan does. I have seen nvidia-smi report up to 232W being used at one point before it downclocks.

Any ideas why it still throttles?

  • Thumbs Up 1
Link to comment
Share on other sites

Well... I have to admit that I wasn't aware of the 6+6 pin power config. Still, that should still allow you easily more than 225W.

If you want I can edit the vbios a bit more, that might help to a certain extent. Though you may have to wait until next week, will be gone for the weekend.

Link to comment
Share on other sites

SVL7 do the mods so he can smash all those cheap titan cards!

Also I don't think 8-pin helps much over 6-pin since the 8-pin just has extra GND lines and not any extra +12V supply lines. I'd be more concerned with VR cooling as power climbs.

  • Thumbs Up 1
Link to comment
Share on other sites

Yes, go ahead with the edits :) No problem if it takes a bit, don't have any rush at the moment. I love that I have encouragement to 'smash Titans' lol.

For what it's worth, the CUDA bandwidth test comes up slightly lower than the GTX Titan in the same motherboard. Hoewver, at the clocks I have gotten to on the K6000 without throttling, I am already performing better than the GTX Titan on the CUDA omputational electromagnetcs code I'm bechmarking. :)

SP performance at stock clocks was higher when I tried nbody benchmark compared to GTX Titan... I want to say past 1.930 TFlops, DP nbody performance at stock clocks was slightly lower than Titan, aroumd 780 GFlops.

If anyone has any bechmark requests... feel free to PM and I'll do my best if I have the time to do them.

Link to comment
Share on other sites

Yes, go ahead with the edits :) No problem if it takes a bit, don't have any rush at the moment. I love that I have encouragement to 'smash Titans' lol.

For what it's worth, the CUDA bandwidth test comes up slightly lower than the GTX Titan in the same motherboard. Hoewver, at the clocks I have gotten to on the K6000 without throttling, I am already performing better than the GTX Titan on the CUDA omputational electromagnetcs code I'm bechmarking. :)

SP performance at stock clocks was higher when I tried nbody benchmark compared to GTX Titan... I want to say past 1.930 TFlops, DP nbody performance at stock clocks was slightly lower than Titan, aroumd 780 GFlops.

If anyone has any bechmark requests... feel free to PM and I'll do my best if I have the time to do them.

Well at matching clocks you should always beat a titan since your 15th SMX is enabled. I'm surprised none of the insane overclocker people ever got K6000s.

Link to comment
Share on other sites

Well at matching clocks you should always beat a titan since your 15th SMX is enabled. I'm surprised none of the insane overclocker people ever got K6000s.

Probably because they dom't have $4k to spend on a single niche product, haha. The TCC feature is useful to me as I work in Windows some of the time and that reduces overhead for some of the codes I work with. ECC memory is nice too, even though I don't currently use it. I know the 12GB of memory will come in handy soon though.

Link to comment
Share on other sites

Well... I have to admit that I wasn't aware of the 6+6 pin power config. Still, that should still allow you easily more than 225W.

If you want I can edit the vbios a bit more, that might help to a certain extent. Though you may have to wait until next week, will be gone for the weekend.

Yup, definitely any help to see if I can get higher clocks is appreciated!

I did a few more tests today, and it seems like I was wrong in my earlier assessments that the higher power limit BIOS helped to reach higher clocks. Even if it did help, given I saw a 232W power usage in nvidia-smi as I posted earlier, whenever the card goes over 225W it definitely throttles, even if I use the original BIOS or your modified one (both modified to higher clocks).

More specifically, with the original BIOS modified to 1071.5 and 1084.5 boost clocks, I get the same results (no throttling) as I did when I tried the higher power limit BIOS you provided with the same clocks,

I attached two GPU-z K600 log files when running in WDDM mode:

1) Run with CUDA code with the default BIOS with the clocks changed to 1071.5 and 1084.5 (No throttlling, but hits up to 100% TDP) -- default_pl_1072_225W_no_throttle.txt

2) Run with CUDA code with modded pl BIOS with the clocks changed to 1150.0 and 1163.0 (throttling, with power limit set to 300W) -- pl_1163_300W_throttled.txt

In (1), I hit 250W briefly a few times, but not enough to make the card throttle, but otherwise stay in the high ~95+% TDP constantly.

In (2), I believe I also used EVGA Precision X to override the maximum voltage for the last run, but it still throttled. I have not (as of yet) tried lowering voltages and seeing it it throttles at higher clocks... I might try that next. Either that, or downlock the memory, because I notice both memory controller load percentage and card voltage throttles when it happens.

pl_1163_300W_throttled.txt

default_pl_1072_225W_no_throttle.txt

Link to comment
Share on other sites

Wanted to post a different example. Using the pl BIOS modded to 1150 & 1163 boost clocks, and using EVGA Precision X under WDDM mode to underclock the memory by 500 MHz, I am able to run the simulation longer before it throttles. I attached 2 files:

1) nvidia-smi output before and after throttling (ignore the memory usage, it's wrong -- a bug of using a GeForce card for display along with a Quadro or Tesla for compute)

2) GPU-z log of the throttling (it is not apparent, unless you see that the card voltage scaled back to 0.95 from 1.025

I remembered that with Titan I had to downclock the memory to get the highest possible clocks. However, Titan gave me the wrong output if I exceeded clocks instead of throttling. The K6000 however just throttles down to avoid the same presumably, or at the very least to not go over its designated power usage.

Probably the sweet spot to avoid throttling with this code is around 1112-1125 clocks with downclocked memory, but I'll have to test that later...

pl_1163_300W_memory_underclock_500_v2.txt

nvidia-smi_throttle.txt

Link to comment
Share on other sites

Wanted to post a different example. Using the pl BIOS modded to 1150 & 1163 boost clocks, and using EVGA Precision X under WDDM mode to underclock the memory by 500 MHz, I am able to run the simulation longer before it throttles. I attached 2 files:

1) nvidia-smi output before and after throttling (ignore the memory usage, it's wrong -- a bug of using a GeForce card for display along with a Quadro or Tesla for compute)

2) GPU-z log of the throttling (it is not apparent, unless you see that the card voltage scaled back to 0.95 from 1.025

I remembered that with Titan I had to downclock the memory to get the highest possible clocks. However, Titan gave me the wrong output if I exceeded clocks instead of throttling. The K6000 however just throttles down to avoid the same presumably, or at the very least to not go over its designated power usage.

Probably the sweet spot to avoid throttling with this code is around 1112-1125 clocks with downclocked memory, but I'll have to test that later...

Well the card will be using less power at lower memory clocks. Not only is the memory using less power, but the GPU core as well since it is being starved for bandwidth and going idle.

Link to comment
Share on other sites

  • 3 weeks later...

Thanks, is it possible to reduce the clock rates for the P states? The problem with fermi quadro (4,5,6000) not K series, is that the virtual svga api-intercept for esxi - if you don't generate a consistent load you are stuck near max P state, thus making it so slow to ramp up that software renderer is better.

Is there an easy way to bop the steps so it is very aggressive except for the maximum power savings state?

Link to comment
Share on other sites

  • 6 months later...
  • 3 months later...
  • 8 months later...
Hi vacoloca, do you still have the K6000?

I don't know if he still has his, but I have a Quadro K6000 with an EK-FCQ6000 installed :) I'm going to give the modded bios posted earlier in this thread a shot when I have some time.

The K6000 is in my personal workstation; I currently have it on a bench in the lab where I work... I couldn't justify this machine under any of our existing grants and I didn't feel like chasing a new grant for this system, so I bought the parts and built it myself. If I move on, the lab will either buy the system from me, or I'll take it home or maybe to my next job.

Specs on this system, named "heavenly":

4x Opteron 6386se with an EK supremacy socket G34 waterblock on each

2x HT<->PCI-e chips on the motherboard, each with Koolance water block

Supermicro H8QGL-6F motherboard

Supermicro 748TQ-R1400B case

3x 1400W hotswap PS

16x 16GiB ECC Registered DDR3 (256GiB total)

Quadro K6000 with EK waterblock

Mellanox 10GBe 2 port SFP+ NIC

Onboard 2 port Intel gigabit rj45

Various SSDs with EXT4 filesystems and 5x 5400rpm big disks in RAIDZ1 with ZFS on top

Gentoo Linux with Ubuntu amd64 in a chroot jail with 32-bit compat libs installed for installing steam and other 32-bit stuff without forcing me to enable multilib support in the production Gentoo installation. Kerbal Space Program and Planetary Annihilation run well, and they are 64-bit, of course. It's just steam itself that is 32.

post-36244-14495000193628_thumb.jpg

post-36244-14495000192935_thumb.jpg

post-36244-14495000192056_thumb.jpg

post-36244-14495000192576_thumb.jpg

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use. We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.