Some Benchmark Results With 8-Core Optimization
Here I present some benchmark timing results made during development of the 8-core-optimized version. The version tested is build 1005, but for marketing purposes we're going to call it SynthEyes 2008.08. The times shown throughout are the solve time on a few specific pre-tracked scenes of qualitatively different sizes, so lower numbers are better. These numbers are for illustration only and are not guaranteed, your mileage may vary.
See also the related forum thread, it has additional information about how to feed shots into SynthEyes, and it serves as a place for discussing this topic.
| Test Shot | Frames | Trackers |
| flyover | 150 | 120 |
| fly300 | 150 | 300 |
| warehouse | 590 | 120 |
| consite | 1891 | 120 |
| CedarHollow | 708 | 120 |
| Bus800 | 220 | 779 |
Part of the fun is that we can run these same benchmark scenes on the same machine with different operating systems to see what we can learn.
IMPORTANT NOTE: The point of this discussion is educational, not to fuel the legion Mac or Windows fanboys. Give it a rest.
This first set of timings is on an 8-core 3.0 GHz Mac Pro (Xeon 5400) with 12 GB of memory (which doesn't really help at all except on the OSX-64 test).
| 8-Core Xeon | OS X | Vista64SP1 | |||||
| 32 | 32 | 64 | Win32 | Win32 | Win64 | Win64 | |
| SynthEyes Version: | 8 | 8.08 | 8.08 | 8 | 8.08 | 8 | 8.08 |
| flyover | 3 | 0.8 | 0.7 | 1.3 | 0.6 | 1.5 | 0.7 |
| fly300 | 9.5 | 3.8 | 3.6 | 6.6 | 3.6 | 7 | 3.7 |
| warehouse | 12.5 | 2.6 | 2.2 | 6.7 | 2.4 | 7.5 | 2.3 |
| consite | 57.2 | 10.6 | 9.8 | 34.5 | 10.3 | 36.3 | 9.7 |
| CedarHollow | 39.1 | 8.2 | 7.5 | 20.9 | 8.2 | 21.3 | 7.1 |
| Bus800 | 72.1 | 38.6 | 38.9 | 56.7 | 39.3 | 57.4 | 39.3 |
You can first compare, for each operating system, the SynthEyes "8" columns to the "8.08" columns, for any given file. For example, on Mac OS X, it takes 57.2 seconds to solve the consite scene using SynthEyes 2008, but only 10.6 seconds using SynthEyes 2008.08. That's a dramatic speedup of almost a factor of six!
Looking a bit more, you'll see that the same scene takes only 34.5 seconds, using the 32-bit SynthEyes (on Vista 64). It then drops to 10.3 seconds. So the dramatic speedup for Mac OS X is partly because the OS X performance wasn't too great to start with. You'll see that same pattern throughout the other test files as well.
Why? Mac OS X's memory allocation, especially for image-sized buffers, is a major bottleneck for when many threads are running within a single application. Doubtless Apple will address this in the future.
But notice that for SynthEyes 2008.08, after optimization, the OS X and Windows times are very similar and reflect the underlying hardware performance, not the operating system. The speedup ratio varies for the scenes, depending on the kind of processing involved. For example, the speedup ratio is not as high on the Bus800 scene, with many trackers, because the software is already running very efficiently on rather large matrices.
Since everyone doesn't have an 8-core machine yet, it is worth asking how the new version runs on older single and dual-core hardware. I have an older single-core AMD Athlon-64 3700+ (2.4 GHz) machine. It is set up to multiboot several Windows operating systems, so we can make some comparisons there also.
| Athlon | Vista32 | XP32 | XP32 | XP64 | XP64 | XP64 |
| SE32 | SE32 | SE32 | SE32 | SE64 | SE64 | |
| (Version:) | 8 | 8 | 8.08 | 8 | 8 | 8.08 |
| flyover | 5.5 | 5.6 | 4.9 | 5.6 | 5.9 | 5.4 |
| fly300 | 37.3 | 34.3 | 34.3 | 34.5 | 32.1 | 31 |
| warehouse | 27.2 | 29.5 | 24.7 | 30 | 27.2 | 23.1 |
| consite | 147.7 | 149.6 | 141.6 | 146.4 | 131.7 | 122.5 |
| CedarHollow | 99.2 | 99.6 | 96.6 | 99 | 85.4 | 81.2 |
| Bus800 | 397.5 | 395.5 | 388.2 | 399.3 | 361 | 360 |
Be sure to check the headings on the columns carefully. As you can see from the 3rd and 4th columns and 6th and 7th columns, the 2008.08 version produces some small improvements even on a single-processor machine, having eliminated some unnecessary redundant work. From the 2nd and 3rd columns, Vista32 vs XP32 is pretty much a wash.
If you look at the 3rd vs 6th columns, SE32 on XP32 vs SE64 on XP64, you'll see that SE64/XP64 numbers provide substantial benefits; this is part of the case for buying SynthEyes 64. That's a bit different than seen on the results above on Vista64, but we haven't tried all the necessary combinations to decide what the full story is.
Here are results for a Core 2 Duo E6600 @ 2.4 GHz running Vista-32.
| Core 2 Duo | SE32 | SE32 |
| 8 | 8.08 | |
| flyover | 2.9 | 1.9 |
| fly300 | 11.6 | 11.2 |
| warehouse | 10.4 | 7.8 |
| consite | 52.2 | 41.7 |
| CedarHollow | 34.3 | 28.2 |
| Bus800 | 125.8 | 132.4 |
As with the single-core machine, the new 08 version provides benefits on more modest machines. You can see that life is a bit complicated though, as the Bus800 test runs a bit slower (probably the revised algorithm is not fitting in the smaller machine's cache as well).
Another set of numbers, for a 2 GHz Core Duo Intel Mac (the initial Intel iMacs):
| IntelMac | 8 | 8.08 |
| flyover | 4.1 | 3.7 |
| fly300 | 20.9 | 20.2 |
| warehouse | 17.6 | 15.3 |
| consite | 87.3 | 80.1 |
| CedarHollow | 63.7 | 54.3 |
| Bus800 | 230.5 | 223.8 |
Hopefully the information here will give you some ideas about the performance on various combinations of hardware and software. My personal view is that the small performance differences between different operating systems on the same hardware should not affect your choice of operating system. If you handle long HD or film-resolution shots, a 64-bit version will allow you to cache shots in RAM that you could not otherwise.