Miscellaneous Performance Metrics

This section looks at some of the other commonly used benchmarks representative of the performance of specific real-world applications.

3D Rendering - CINEBENCH R15

We use CINEBENCH R15 for 3D rendering evaluation. The program provides three benchmark modes - OpenGL, single threaded and multi-threaded. Evaluation of different PC configurations in all three modes provided us the following results.

3D Rendering - CINEBENCH R15 - Single Thread

3D Rendering - CINEBENCH R15 - Multiple Threads

3D Rendering - CINEBENCH R15 - OpenGL

This benchmark is generally CPU-limited, and we do not see any significant benefit from moving to the higher speed grades, at least within the set of configurations that we tested.

x265 Benchmark

Next up, we have some video encoding benchmarks using x265 v2.8. The appropriate encoder executable is chosen based on the supported CPU features. In the first case, we encode 600 1080p YUV 4:2:0 frames into a 1080p30 HEVC Main-profile compatible video stream at 1 Mbps and record the average number of frames encoded per second.

Video Encoding - x265 - 1080p

Our second test case is 1200 4K YUV 4:2:0 frames getting encoded into a 4Kp60 HEVC Main10-profile video stream at 35 Mbps. The encoding FPS is recorded.

Video Encoding - x265 - 4K 10-bit

x265 is again a CPU-limited benchmark, and the memory speeds have negligible impact on the performance for our benchmarking encoding operations.

7-Zip

7-Zip is a very effective and efficient compression program, often beating out OpenCL accelerated commercial programs in benchmarks even while using just the CPU power. 7-Zip has a benchmarking program that provides tons of details regarding the underlying CPU's efficiency. In this subsection, we are interested in the compression and decompression rates when utilizing all the available threads for the LZMA algorithm.

7-Zip LZMA Compression Benchmark

7-Zip LZMA Decompression Benchmark

The 7-Zip compression benchmark is probably the best real-world representative of what a good DRAM configuration can deliver. The compression rate is highly dependent on the memory latency, and we see that the DDR4-2933 configuration (that had the best latency numbers in the AIDA64 Cache and Benchmarks testing) comes out on top. Decompression is CPU-limited, and the memory speeds don't impact it much.

Cryptography Benchmarks

Cryptography has become an indispensable part of our interaction with computing systems. Almost all modern systems have some sort of hardware-acceleration for making cryptographic operations faster and more power efficient. In this sub-section, we look at two different real-world applications that may make use of this acceleration.

BitLocker is a Windows features that encrypts entire disk volumes. While drives that offer encryption capabilities are dealt with using that feature, most legacy systems and external drives have to use the host system implementation. Windows has no direct benchmark for BitLocker. However, we cooked up a BitLocker operation sequence to determine the adeptness of the system at handling BitLocker operations. We start off with a 2.5GB RAM drive in which a 2GB VHD (virtual hard disk) is created. This VHD is then mounted, and BitLocker is enabled on the volume. Once the BitLocker encryption process gets done, BitLocker is disabled. This triggers a decryption process. The times taken to complete the encryption and decryption are recorded. This process is repeated 25 times, and the average of the last 20 iterations is graphed below.

BitLocker Encryption Benchmark

BitLocker Decryption Benchmark

Due to the use of a RAM drive in this benchmark, the usually CPU-speed limited cryptographic operations of BitLocker seem to favor the DDR-2933 configuration. We move on to a couple of other benchmarks to see if the RAM drive is indeed the cause for the significant gulf seen in the above graphs.

Creation of secure archives is best done through the use of AES-256 as the encryption method while password protecting ZIP files. We re-use the benchmark mode of 7-Zip to determine the AES256-CBC encryption and decryption rates using pure software as well as AES-NI. Note that the 7-Zip benchmark uses a 48KB buffer for this purpose.

7-Zip AES256-CBC Encryption Benchmark

7-Zip AES256-CBC Decryption Benchmark

Once the data is in the CPU cache, we see that the RAM has no impact on the cryptography operations.

Yet another cryptography application is secure network communication. OpenSSL can take advantage of the acceleration provided by the host system to make operations faster. It also has a benchmark mode that can use varying buffer sizes. We recorded the processing rate for a 8KB buffer using the hardware-accelerated AES256-CBC-HAC-SHA1 feature.

OpenSSL Encryption Benchmark

OpenSSL Decryption Benchmark

As expected, this benchmark also shows that there is nothing to gain in performance by moving to SO-DIMMs with better speeds or timing characteristics.

Agisoft Photoscan

Agisoft PhotoScan is a commercial program that converts 2D images into 3D point maps, meshes and textures. The program designers sent us a command line version in order to evaluate the efficiency of various systems that go under our review scanner. The command line version has two benchmark modes, one using the CPU and the other using both the CPU and GPU (via OpenCL). We present the results from our evaluation using the CPU mode only. The benchmark (v1.3) takes 84 photographs and does four stages of computation:

  • Stage 1: Align Photographs (capable of OpenCL acceleration)
  • Stage 2: Build Point Cloud (capable of OpenCL acceleration)
  • Stage 3: Build Mesh
  • Stage 4: Build Textures

We record the time taken for each stage. Since various elements of the software are single threaded, and others multithreaded, it is interesting to record the effects of CPU generations, speeds, number of cores, and DRAM parameters using this software.

Agisoft PhotoScan Benchmark - Stage 1

Agisoft PhotoScan Benchmark - Stage 2

Agisoft PhotoScan Benchmark - Stage 3

Agisoft PhotoScan Benchmark - Stage 4

The raw bandwidth provided by the DDR4-3066 configuration seems to work well for the Agisoft Photoscan workload. The latency doesn't seem to be much of a factor.

Dolphin Emulator

Wrapping up our application benchmark numbers is the new Dolphin Emulator (v5) benchmark mode results. This is again a test of the CPU capabilities.

Dolphin Emulator Benchmark

The memory characteristics don't seem to affect the benchmark much, though we see the DDR4-2933 configuration coming up with the best performance.

SPECworkstation 3 Benchmark Final Words
Comments Locked

25 Comments

View All Comments

  • cygnus1 - Wednesday, November 28, 2018 - link

    How does anyone look at those memory benchmarks and justify buying anything other than the cheapest RAM that meets minimum spec?
  • Yuriman - Wednesday, November 28, 2018 - link

    Pretty much agree. Good to know, though.
  • nwrigley - Wednesday, November 28, 2018 - link

    Yep. The only difference for me is that I only buy Crucial. This comes from personal experience of AMAZING customer support from them.

    I had one of their sticks die on me once after 8 years of use. I called in and was shocked to talk to someone in the US. Since they didn't make the same RAM I had anymore, they offered to replace all 4 sticks so that I had a matching set, even though 3 of the 4 sticks were fine. And since the replacements were slower timings without heat spreaders, they offered to double the capacity to cover the difference. They upgraded me from 4x 1-gig sticks of DDR2 to 4x 2-gigs of DDR2 without me raising any fuss - this was all customer services idea. That made me a customer for life.
  • cygnus1 - Wednesday, November 28, 2018 - link

    Yeah, I too am a big Crucial fan because of reasons like this, from my experience as well. Not quite as generous as your story, but never any kind of trouble getting support for their hardware.

    But these benchmarks really show that performance should not be even remotely near the top of the list of reasons to pick one RAM part over another. Brand/warranty/customer service is a real way to differentiate and justify a given price.
  • koaschten - Wednesday, November 28, 2018 - link

    I found this handy graphic on reddit some time ago:
    https://i.imgur.com/lbPIkiW.png

    Looking at the tested offerings, it is obvious why there was so little performance gain, the Latency/clock relations are just off the chart, for the 3066 CL20 literally.
  • koaschten - Wednesday, November 28, 2018 - link

    source: https://www.reddit.com/r/intel/comments/9mlwbn/ram...
    yes, this is DIMM not SO-DIMM, but shows the differences nicely.
  • willis936 - Wednesday, November 28, 2018 - link

    It is somewhat frustrating to see all of this work done on a case that doesn't make sense to examine first.

    If the original question is "When does memory performance matter to CPUs?" then the place to start is at the extreme, not somewhere in the middle. If it was found that an 8 core 4 GHz x86 processor with whatever cache architecture and two channels of memory was memory bandwidth or latency starved THEN it would make sense to start moving down the stack and identify when it is no longer a concern. The conclusion to draw from this is much less meaningful to most any reader. There are like five people on the planet choosing between more expensive and cheaper memory kits for SFF systems.
  • GreenReaper - Thursday, November 29, 2018 - link

    Might make more sense with AMD APUs. You'd probably get a much better return from overclocking the memory than the CPU, given how bandwidth-starved they can be.
  • The_Assimilator - Wednesday, November 28, 2018 - link

    Whatever happened to ranking memory by its performance rating, to determine how objectively good it is? For anyone who doesn't know/remember, performance rating = (memory frequency / CAS latency), and higher = better.

    It's sad that in this day and age, my 2x8GB DDR3-1600 CL8 (with no RGB LEDs or unnecessary heatsinks) has a higher PR than any of these DDR4 kits. It's even sadder that today's reviews of memory that "overclock" it, just concentrate on pushing up the frequency instead of trying to tighten the CAS timings, because the latter is where you'll see the most benefit.
  • nevcairiel - Wednesday, November 28, 2018 - link

    All you are calculating here is the actual latency, since CAS latency is expressed in cycles. What this doesn't account for is the actual memory speed (ie. bandwidth).

    Just using your formula, a 1600/8 and 3200/16 module should be equal, right? But one of those offers twice the raw memory throughput, at about similar absolute latency (ie. performance rating).

    It is a good idea to keep in mind that latency and frequency interact, but not in a way you suggest. Many people look at things like 3200 CL16 and 3600 CL18 and would instinctively say that the second set has a higher latency, while in reality the actual latency is quite similar, and you get more bandwidth.

Log in

Don't have an account? Sign up now