Overview
What was true for JDK 1.0 is always true for JDK 6. However older documentation does not reflect this.
Its worth noting that most simple benchmarks do not test the performance of the code in an multi-threaded environments. In the examples I have provided, these are the results for running four threads.
Notes on performance.
"
optimised out" means the JVM realising that the operation does nothing, optimised the code to nothing and reported over 1 trillion operations/second.
The results over 10,000,000 or over 10 billion operations/second are probably partially optimised out and may be a test of how quickly the JVM optimised out the test rather than the actual speed of the test.
Myth: Looking down is faster than looping up.
In Java 5 and Java 6 there is no difference between looping down or up. Even looping up or down an empty loop is much the same.
| Empty looping | Speed |
|---|---|
| Looping down using multiple threads | 2,894,000 K/second. |
| Looping up using multiple threads | 2,095,000 K/second. |
The difference here is less than 0.5 of a clock cycle. For any loop which does real work this difference is trivial.
However, for trival non-empty loop, the performance is the same.
int[] num = { 0 }; for (int i = 1000 * 1000; i > 0; i--) num[0]++;
int[] num = { 0 }; for (int i = 0; i < 1000 * 1000; i++) num[0]++;
| Trivial loop | Linux JDK 6u5 | Linux JDK 5u11 | PC JDK 6u5 |
|---|---|---|---|
| Looping down using multiple threads | |
|
1,438,000 K/second. |
| Looping up using multiple threads | |
|
1,453,000 K/second. |
Legend: Calling Math.max(a,b) is 7 times slower than (a > b) ? a : b. This is the cost of a method call.
The Linux server is a blade with two dual core AMD 64 2.4 GHz processors. Performed in 2008.
The PC is a Windows XP workstation with a two Xeon 3 GHz PC. Performed in 2008.
Small methods can be inlined. This reduces or removes the impact of a method call.
| Getting the maximum value | Linux JDK 6u5 | Linux JDK 5u11 | PC JDK 6u5 |
|---|---|---|---|
| Calling Math.max(1, i) | |
|
1,609,900 K/second. |
| Using (1 >= i) ? 1 : i | |
|
1,610,400 K/second. |
Note: because JDK 5 and 6 on Linux optimised out Math.max(), it appeared that Math.max() was around 90x faster than the "? :" trigraph.
Legend: Other slow operations.
The Sparc 20, JDK 1.1.4 was running Solaris. These test were perform in 1998.
Obviously the hardware is significantly faster but the JVM is also smarter.
Timings are in K operations/second.
| Linux JDK 6u5 | Linux JDK 5u11 | PC JDK 6u5 | Sparc 20 JDK 1.1.4 | code | operation |
|---|---|---|---|---|---|
| |
|
1,276,609 | 147,058 | b = (i & 0x100) != 0 | get element of int bits |
| |
1,474,653 | 112,965 | 314 | b = bitSet.get(3); | get element of Bitset |
| |
|
2,127,880 | 20,000 | obj = objs[1]; | get element of Array |
| |
|
956,285 | 5,263 | str.charAt(5); | get element of String |
| |
169,723 | 334,761 | 361 | buf.charAt(5); | get element of StringBuffer |
| |
|
979,260 | n/a | buf.charAt(5); | get element of StringBuilder |
| 527,364 | 197,577 | 282,910 | 337 | objs2.get(1); | get element of Vector |
| |
|
291,080 | n/a | objs2.get(1); | get element of ArrayList |
| 80,075 | 82,585 | 56,850 | 241 | hash.get("a"); | get element of Hashtable |
| 217,602 | 214,757 | 85,371 | n/a | hash.get("a"); | get element of LinkedHashMap |
| 1,093,388 | 1,510,255 | 71,339 | 336 | bitset.set(3); | set element of Bitset |
| |
|
356,209 | 5,555 | objs[1] = obj; | set element of Array |
| |
181,385 | 326,831 | 355 | buf.setCharAt(5,' '); | set element of StringBuffer |
| |
|
982,204 | n/a | buf.setCharAt(5,' '); | set element of StringBuilder |
| 826,486 | 189,073 | 178,017 | 308 | objs2.set(1, "hi"); | set element of Vector |
| |
|
107,019 | n/a | objs2.set(1, "hi"); | set element of ArrayList |
| 98,863 | 78,711 | 38,065 | 237 | hash.put("a", obj); | put element of Hashtable |
| 298,666 | 160,212 | 40,210 | n/a | hash.put("a", obj); | put element of LinkedHashMap |
| When profiling or performance tuning it is best to test a real application with real usage on your target system. An application can perform very differently on different systems, even for the same version of Java. |
Legend: Speed of creating objects and arrays is very slow.
Timings are in K operations/second.
| Linux JDK 6u5 | Linux JDK 5u11 | PC JDK 6u5 | code | operation |
|---|---|---|---|---|
| 218,942 | 216,935 | 106,851 | new Object(); | Create a simple object |
| 55,989 | 53,555 | 14,472 | new int[10]; | Create an array |
| 1,931 | 1,193 | 863 | new Exception(); | Create an Exception |
| 3,126 | 3,296 | 631 | new LinkedHashMap(map); | Create a map with 10 String keys |
| 63,439 | 59,795 | 17,875 | new TenFields( ... ); | Create an object with ten fields using a constructor |
| 63,529 | 60,606 | 18,205 | new TenFields(); setX() x 10 | Create an object with ten fields and ten setters |
Creating Exceptions are still relatively slow. However it you are creating around 1 million exception per second, perhaps you could make them more exceptional.
Legend: Use StringBuffer instead of + String concatenation.
Even for early version of the JDK, this was the same thing. However + was clearer and therefore better.
However, from Java 5, The String + uses StringBuilder which is more efficient than StringBuffer.
In fact the compiler will inline and simplify constants so that string concatenation can be removed at compile time.
Solution.
From Java 5, replace StringBuffer with StringBuilder, unless it is a field. BTW: StringBuffer is unlikely to be a good choice for a field shared between threads.
Legend: Synchronized methods are 50 times slower than non-synchronized methods.
With Java 6, when a lock is typically gained my the same thread, a synchronized method is about 1.06 - 3 times slower.
Legend: Reusing object improves performance.
In Java 5 and 6, Object pools can confuse the GC. For this reason, it can be faster and simpler to remove the object pool. Note: the GC tries to recycle objects for you in any case.
Add Comment