Jump to content

Gradual Degradation of 3DMark Fire Strike Produces Unreliable Results


Guest

Recommended Posts

 

For those that are not already aware of the issue, the folks at Futuremark seem to be struggling to keep a consistent product in the latest 3DMark benchmark. In particular, Fire Strike. Sometime around the release of Time Spy things started getting screwy with Fire Strike and now it seems with every Fire Strike GUI version update the effect is progressively decreasing benchmark scores, and specifically the physics portion of the benchmark.

 

Kudos to @Papusan for noticing this months ago and asking me to have a look at it. He has been going back and forth with Futuremark about the problem and it seems they are either ignoring him or perhaps they do not view it as a high priority issue. Or, maybe because most people running Fire Strike are not observant enough to notice, care, or ask questions they feel they don't need to fix it.

 

Some people might say you cannot compare results across benchmark software versions, but that shouldn't hold water here. There is a leaderboard and searchable database of results that basically every benching enthusiast and PC reviewer relies on, and if there is not a very high degree of consistency between GUI versions the results in their database will become irrelevant, as will their leaderboard. The search filter does not have a field to filter by GUI version, so we can expect the results from the database and leaderboard to be increasingly misleading, inaccurate and unreliable over time. This certainly is not a desirable thing for what is supposedly the current defacto standard in PC benchmarks.

 

You will notice from the examples posted below that with each new version of Fire Strike the scores get lower and lower. These examples are consecutive runs on the same day, same machine, and identical CPU and GPU settings. The only thing that changes is Fire Strike benchmark results degrade with newer versions. We need Futuremark to understand and correct this.

 

http://www.3dmark.com/compare/fs/11047304/fs/11047179/fs/11047154

 

 

Here is a similar example from @Papusanhttp://www.3dmark.com/compare/fs/11036017/fs/11035883

 

If you agree this is a problem and want it to be fixed, please complain to Futuremark and let them know they need to put the brakes on and not do anything else with 3DMark until they have this mess under control. Gimmicky features are one thing, but inconsistent benchmark results makes 3DMark unreliable.

 

If you would like to do your own testing to validate the issue before contacting Futuremark, older versions of 3DMark are available for download from the TechPowerUp.com web site. 

 

In case you're not good at simple math, here is a visual aid to show what the fuss is about.

 

Incompetence.jpg

 

Update 12/13/2016:

We would like to acknowledge that a representative of Futuremark has responded promptly to this article and provided an email address for those interested in communicating with them about the issue. We appreciate the accountability and responsiveness. 

Update 12/15/2016:

We sincerely are grateful for Futuremark's responsiveness. I provided additional test results to Mr. Kokko to corroborate the findings of @Papusan and they have released an update that is expected to resolve the issue. See the message from James below for more details.

14 hours ago, Futuremark_James said:

Hello. James from Futuremark here again.


We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

 

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

 

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

On 12/13/2016 at 0:19 PM, Futuremark_James said:

Hi. James from Futuremark here.

 

We've been looking into this today, and I'd like to share what we've found.

 

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

 

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

 

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

 

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at [email protected] so we can go through some troubleshooting steps with you.

 

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

 

I'll post here again when we have more info to share.


View full article

Link to comment
Share on other sites

12 minutes ago, Lee James Wood said:

what are we talking here a few points ? or like few % differences ?

Take a quick look at the links and or watch the details in the video and then you tell me. ;)

 

You'll have your answer is less than 3 minutes watching the video or instantly clicking the link posted directly above the video.

Link to comment
Share on other sites

LOL, just click the link above the video. It shows what the video does and it's instant gratification. The only reason I made the video was for YouTube viewers and for increased exposure of the issue that warrants attention.

 

If it eases the burden, here's the link right here:  http://www.3dmark.com/compare/fs/11047304/fs/11047179/fs/11047154

 

The lowest score is the "latest and greatest" version of Fire Strike. The benchmarks were run only minutes apart on the same machine with identical CPU and GPU settings with no changes whatsoever except for the version of Fire Strike.

 

Incompetence.jpg

Link to comment
Share on other sites

6 minutes ago, Brian said:

Did they alter their physics calculation algorithm or is this a case of bloat? I'd be curious to see what Futuremark says about this. 

@Papusan has been communicating with them for a while now and since that started every version gets progressively worse, so I thought it was time to bubble this up for attention before it gets too far out of hand. I have not had any communication with them. It's important that they pay attention to this to preserve their reputation. If nobody can trust the results to be accurate and it becomes necessary to research the version number to draw comparisons that are not dramatically skewed it could really hurt them. I don't want that to happen, so the article here will hopefully be a wake-up call for them. The degradation of results are already significant over multiple software revisions and that will make it more difficult for professional reviewers to get a clear and reliable comparison of old versus new tech as well.

Link to comment
Share on other sites

Thanks for the help bro @Mr. Fox

I have sent a new feedback mail to Futuremark v/jarlo Kokko for the 10th time... Futuremark have said long time ago to me in the mail that "They have reproduced in-house and investigation is ongoing" I have send them a lot of result for their investigation. Nothing happens as you can see in the pictures - links!!!

 And when they finally push out the new <FIXED> 3DM version after 3 months, so is the 3DM benchmark software in an even worse condition...

Like the last time... New 3DM suite UI 2.2.3488 64 version out 9th Dec. = Fiasko!! Then they need to push out an even newer one because the trouble witht the first one out... 1 day later aka 10th Dec. The newest messed up come out <UI 2.2.3491 64>.

Same mess happened last two time as well(I think in July and Aug). Futuremark have BIG problems with their 3DM Suite!!!

See results. Both older UI versions 2.0.2067_64 and 2.0.2809_64 will give 15002 in Physics with [email protected] and both 2 latest drivers from Nvidia!! Newer UI versjons of 3DM Suite will give up to 400 points lower physics in fire Strike. All tested with same Nvidia drivers, stock graphics and 4.8GHz on processor.

Mine tests!!(Papusan)
Tested with latest Nvidia driver 375.95
http://www.3dmark.com/compare/fs/11036017/fs/11035883#

 

GetAttachmentThumbnail?id=AQMkADAwATIwMTAwAC0wMzg3LWIyMTUtMDACLTAwCgBGAAADwkSjwPjSaEOmnZD9pA30agcA3iLIgJScN0%2B372ByGILsMAAAAgEJAAAA3iLIgJScN0%2B372ByGILsMAAAAGmQpAkAAAABEgAQAOqCxfKe7LdKqSR6puJCNvo%3D&thumbnailType=2&X-OWA-CANARY=opY4l8jSikGpqsCtzlNPKMA86CdFI9QYFSZL4xjbhinkp0FKNgP_JX-NikVY4V-9p469I5T1N3E.&token=d85380cf-2300-4072-b719-c64cd0c4c8f2&owa=outlook.live.com&isc=1

 

Tested with latest Nvidia driver 376.19
http://www.3dmark.com/compare/fs/11049261/fs/11057220

GetAttachmentThumbnail?id=AQMkADAwATIwMTAwAC0wMzg3LWIyMTUtMDACLTAwCgBGAAADwkSjwPjSaEOmnZD9pA30agcA3iLIgJScN0%2B372ByGILsMAAAAgEJAAAA3iLIgJScN0%2B372ByGILsMAAAAGmQpAkAAAABEgAQAPQ0Tbk2MKpMr06M9Fq892A%3D&thumbnailType=2&X-OWA-CANARY=opY4l8jSikGpqsCtzlNPKMA86CdFI9QYFSZL4xjbhinkp0FKNgP_JX-NikVY4V-9p469I5T1N3E.&token=d85380cf-2300-4072-b719-c64cd0c4c8f2&owa=outlook.live.com&isc=1

  • Thumbs Up 2
Link to comment
Share on other sites

7 hours ago, Papusan said:

Thanks for the help bro @Mr. Fox

I have sent a new feedback mail to Futuremark v/jarlo Kokko for the 10th time... Futuremark have said long time ago to me in the mail that "They have reproduced in-house and investigation is ongoing" I have send them a lot of result for their investigation. Nothing happens as you can see in the pictures - links!!!

 And when they finally push out the new <FIXED> 3DM version after 3 months, so is the 3DM benchmark software in an even worse condition...

Like the last time... New 3DM suite UI 2.2.3488 64 version out 9th Dec. = Fiasko!! Then they need to push out an even newer one because the trouble witht the first one out... 1 day later aka 10th Dec. The newest messed up come out <UI 2.2.3491 64>.

Same mess happened last two time as well(I think in July and Aug). Futuremark have BIG problems with their 3DM Suite!!!

See results. Both older UI versions 2.0.2067_64 and 2.0.2809_64 will give 15002 in Physics with [email protected] and both 2 latest drivers from Nvidia!! Newer UI versjons of 3DM Suite will give up to 400 points lower physics in fire Strike. All tested with same Nvidia drivers, stock graphics and 4.8GHz on processor.

Mine tests!!(Papusan)
Tested with latest Nvidia driver 375.95
http://www.3dmark.com/compare/fs/11036017/fs/11035883#

 

GetAttachmentThumbnail?id=AQMkADAwATIwMTAwAC0wMzg3LWIyMTUtMDACLTAwCgBGAAADwkSjwPjSaEOmnZD9pA30agcA3iLIgJScN0%2B372ByGILsMAAAAgEJAAAA3iLIgJScN0%2B372ByGILsMAAAAGmQpAkAAAABEgAQAOqCxfKe7LdKqSR6puJCNvo%3D&thumbnailType=2&X-OWA-CANARY=opY4l8jSikGpqsCtzlNPKMA86CdFI9QYFSZL4xjbhinkp0FKNgP_JX-NikVY4V-9p469I5T1N3E.&token=d85380cf-2300-4072-b719-c64cd0c4c8f2&owa=outlook.live.com&isc=1

 

Tested with latest Nvidia driver 376.19
http://www.3dmark.com/compare/fs/11049261/fs/11057220

GetAttachmentThumbnail?id=AQMkADAwATIwMTAwAC0wMzg3LWIyMTUtMDACLTAwCgBGAAADwkSjwPjSaEOmnZD9pA30agcA3iLIgJScN0%2B372ByGILsMAAAAgEJAAAA3iLIgJScN0%2B372ByGILsMAAAAGmQpAkAAAABEgAQAPQ0Tbk2MKpMr06M9Fq892A%3D&thumbnailType=2&X-OWA-CANARY=opY4l8jSikGpqsCtzlNPKMA86CdFI9QYFSZL4xjbhinkp0FKNgP_JX-NikVY4V-9p469I5T1N3E.&token=d85380cf-2300-4072-b719-c64cd0c4c8f2&owa=outlook.live.com&isc=1

Glad to help. I think everyone that knows about the problem will want them to correct it.

 

Your images are broken. Maybe posting them on imgur or postimage.org and use the direct links to insert them here would help.

Link to comment
Share on other sites

Hi. James from Futuremark here.

 

We've been looking into this today, and I'd like to share what we've found.

 

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

 

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

 

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

 

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at [email protected] so we can go through some troubleshooting steps with you.

 

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

 

I'll post here again when we have more info to share.

 
 
  • Thumbs Up 3
Link to comment
Share on other sites

11 minutes ago, Futuremark_James said:

Hi. James from Futuremark here.

 

We've been looking into this today, and I'd like to share what we've found.

 

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

 

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

 

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

 

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at [email protected] so we can go through some troubleshooting steps with you.

 

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

 

I'll post here again when we have more info to share.

 
 

Thank you for responding. Much appreciated! 

 

I will reach out to the email provided so you can ask questions privately by email. The exaggerated example shown with the wide difference is not incremental. By choosing GUI versions that were further apart in time, those two versions in particular, there was a much greater variance than, for example, comparing the latest to the most recent previous GUI version. Of course the concern is the fact that over time it would be more difficult to compare things using scores from the database or leaderboard due to the gradual but growing decrease in physics performance, and the combined test.

@Futuremark_James - here is a less dramatic example from two versions released close to one another. The variance is probably a closer representation to what you have seen comparing current to last release. 

 

http://www.3dmark.com/compare/fs/11044699/fs/11044880

Link to comment
Share on other sites

1 hour ago, Futuremark_James said:

Hi. James from Futuremark here.

 

We've been looking into this today, and I'd like to share what we've found.

 

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

 

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

 

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

 

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

 

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

 

I'll post here again when we have more info to share.

 
 

Thanks for taking care of this problem. I reported this problems medium August. Now December!! I really hope this now finally will be fixed. Thanks again :)

2 hours ago, Mr. Fox said:

Glad to help. I think everyone that knows about the problem will want them to correct it.

 

Your images are broken. Maybe posting them on imgur or postimage.org and use the direct links to insert them here would help.

Sorry Fox. I posted with my small phone, so pict was screwed I think :)

  • Thumbs Up 1
Link to comment
Share on other sites

http://www.3dmark.com/compare/fs/7506459/fs/11073157#

I ran a benchmark with the latest Fire Strike version to an older run I had done when I got my 13 R2. While the drivers versions are different, I should have seen in increase in performance nonetheless. Made sure to minimize background tasks as much as I needed as well.
- Game7a1

  • Thumbs Up 2
Link to comment
Share on other sites

  • Founder
1 hour ago, Futuremark_James said:

Hi. James from Futuremark here.

 

We've been looking into this today, and I'd like to share what we've found.

 

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

 

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

 

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

 

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

 

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

 

I'll post here again when we have more info to share.

 
 

 

Hi James and welcome to Tech|Inferno! Thanks for the response and I think the enthusiast community will be waiting to hear back on the findings Futuremark has on this discrepancy and what will be done to resolve it. 

  • Thumbs Up 2
Link to comment
Share on other sites

33 minutes ago, Game7a2 said:

http://www.3dmark.com/compare/fs/7506459/fs/11073157#

I ran a benchmark with the latest Fire Strike version to an older run I had done when I got my 13 R2. While the drivers versions are different, I should have seen in increase in performance nonetheless. Made sure to minimize background tasks as much as I needed as well.
- Game7a1

Thank you so much for taking time to respond. We really appreciate it.

 

Your 16% variance is a great example that is similar to the one I posted. This demonstrates how much disparity there is with Fire Strike submissions over the course of revisions that have occurred since the beginning of the year. While the changes between consecutive revisions seem like they are within a small margin of error at first blush, the cumulative effect is not acceptable if the results stored in their database and respect for their leaderboard are to be deemed important and useful data.

Link to comment
Share on other sites

1 hour ago, Mr. Fox said:

Thank you so much for taking time to respond. We really appreciate it.

 

Your 16% variance is a great example that is similar to the one I posted. This demonstrates how much disparity there is with Fire Strike submissions over the course of revisions that have occurred since the beginning of the year. While the changes between consecutive revisions seem like they are within a small margin of error at first blush, the cumulative effect is not acceptable if the results stored in their database and respect for their leaderboard are to be deemed important and useful data.

Even a small change in score around 1-3% in the subtests between the different GUI versions, should be easy to find with normal testing. Also the change in power draw between the GUI versions... should be easily discovered and a bell should start to ring. 

Edited by Papusan
  • Thumbs Up 1
Link to comment
Share on other sites

Yes, I agree @Papusan. It should have been detectable. It depends on how rigorous the testing was and whether they connected the dots that every version got worse and worse to where the cumulative effect of a drop with each GUI revision amounts to a lot over time.

 

I really hate the bloated new UI. The older version that wasn't so busy was much better. Maybe in the process of fixing this problem they can return to the older/sleeker UI. The big circle at the top with the score and the excessive amount of wasted screen space that gets hogged up by junk is unnecessary and unattractive in my personal opinion. Maybe that was to make the kiddos happy or something.

Link to comment
Share on other sites

57 minutes ago, Mr. Fox said:

Yes, I agree @Papusan. It should have been detectable. It depends on how rigorous the testing was and whether they connected the dots that every version got worse and worse to where the cumulative effect of a drop with each GUI revision amounts to a lot over time.

 

I really hate the bloated new UI. The older version that wasn't so busy was much better. Maybe in the process of fixing this problem they can return to the older/sleeker UI. The big circle at the top with the score and the excessive amount of wasted screen space that gets hogged up by junk is unnecessary and unattractive in my personal opinion. Maybe that was to make the kiddos happy or something.

You are 100% right about the new 3DM GUI. This pict below is from the old 3DM Fire Strike GUI. As you can see in the picture, you could actually see the maximum CPU power draw in benchmark test if you checked the box.
New is not always better!! Just look at the Windows X failure. The new Os... Windows 10 look like an OS designed for children around 5 years old with all the pastel colored tiles. More intended for handheld tablet, phones. The <new> Windows is No longer a nice OS for desktops.

oObnp8h.png

Edited by Papusan
  • Thumbs Up 2
Link to comment
Share on other sites

Hello. James from Futuremark here again.


We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

 

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

 

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

  • Thumbs Up 3
Link to comment
Share on other sites

10 minutes ago, Futuremark_James said:

Hello. James from Futuremark here again.


We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

 

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

 

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

Thank you so much, James. I am looking forward to testing the new release and confirming the fix. I trust @Papusan is equally appreciative. Thanks as well to Jarno Kokko (Futuremark) for his assistance.

 

I have updated the article with your response.

Link to comment
Share on other sites

1 hour ago, Futuremark_James said:

Hello. James from Futuremark here again.


We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

 

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

 

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

Thanks for the help James. And say thanks from us bench enthusiasts to Mr. KOKKO on Futuremark as well. For us here is number crunching a pleasure as you probably know, so it's importent that the bench tests work as intended.

And we hope Futuremark might consider making benchmark tests that put more emphasis/importance on the processor power, than what sub tests like Firestrik in 3DMark suite does today. More like the old 3DM Vantage and 3DM11. 

I am looking forward to testing the new fixed 3DM suite and confirming the fix. But this will take time because my internet speed sucks :)

Once again thank you for all your help

 

Papusan

  • Thumbs Up 1
Link to comment
Share on other sites

@Futuremark_James - preliminary testing seems to show an improvement as expected, and in line the percentage of improvement predicted. Thank you.

 

Any input on why the CPU power monitoring feature was removed? Was it necessary, or something Futuremark thought nobody cared about.

 

See below:

 

Previous GUI versus new GUI released today:

http://www.3dmark.com/compare/fs/11094602/fs/11085019

 

New GUI today using previous driver versus new GUI today using new @J95 driver mod:

http://www.3dmark.com/compare/fs/11094778/fs/11094602

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use. We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.