Overview
Charles Nutter in Shared Data Considered Harmful raises an interesting point that contention between threads can result in so much overhead that a single threaded application would perform better. In the article I discuss the problem and outline a simple solution .
Job allocation.
The problem here is similar to the performance issue you get when multiple db clients are allocating ids from a database. And in both cases, the solution is to allocate blocks of ids for each client/thread to processes.
In the following code I look at the effect of block size of scalability. The last of three runs by the following code looks like:
Note: I have two cores, so the 4 cpu example is not faster, but it would be if I had that many.
With the job size of 1, the 2 cpu example is about 4x slower. This might be surprising give you are using more CPUs. The problem is that instead of placing the shared data in a 1 level cache, the date has to be placed in main memory which is significantly slower.
With a job size of 10, the benchmark is faster all round and has more benifit than we hoped to get from mutli-threading the code. Even the 1 cpu example is 5x faster. Note: the over head is less and the 2 cpu example is now only ~ 2x slower.
With a job size of 100, the 1 cpu example is faster again, and now you can see the 2 cpu test is about 40% faster (40/28 - 1) than the 1 cpu test. This is better but still not great.
With a job size of 1000, the 1 cpu example is now more than 10x faster as the job numbers are incremented in the core of the cpu rather than in the cache and the two cpu test is about 70% faster.
With a job size of 10000, the 2 cpu example is about 72% faster.
You will not get to the point where two CPU provide double with this benchmark, but you can get close.
Conclusion
Don't assume that because you have spent time optimising the code, potentially making the code more complex and less flexible that you have actually made the code faster!.
You need to performance test optimisations because you can find you have made the code significantly slower, not faster. ![]()
Results.
| Java 5u10 | Java 6u5 |
|---|---|
| Thread count: 1, job size: 1 time: 389,431,618 ns. , fired: 39063 Thread count: 2, job size: 1 time: 1,157,105,163 ns. , fired: 39063 Thread count: 4, job size: 1 time: 1,255,407,322 ns. , fired: 39063 Thread count: 1, job size: 10 time: 68,090,498 ns. , fired: 39063 Thread count: 2, job size: 10 time: 140,359,739 ns. , fired: 39063 Thread count: 4, job size: 10 time: 140,082,609 ns. , fired: 39063 Thread count: 1, job size: 100 time: 40,225,783 ns. , fired: 39063 Thread count: 2, job size: 100 time: 24,813,489 ns. , fired: 39063 Thread count: 4, job size: 100 time: 28,759,522 ns. , fired: 39063 Thread count: 1, job size: 1000 time: 33,141,922 ns. , fired: 39063 Thread count: 2, job size: 1000 time: 19,543,825 ns. , fired: 39063 Thread count: 4, job size: 1000 time: 20,030,199 ns. , fired: 39063 Thread count: 1, job size: 10000 time: 32,286,227 ns. , fired: 39063 Thread count: 2, job size: 10000 time: 18,828,370 ns. , fired: 39063 Thread count: 4, job size: 10000 time: 19,157,184 ns. , fired: 39063 |
Thread count: 1, job size: 1 time: 306,640,928 ns. , fired: 39063 Thread count: 2, job size: 1 time: 1,076,376,746 ns. , fired: 39063 Thread count: 4, job size: 1 time: 1,193,009,447 ns. , fired: 39063 Thread count: 1, job size: 10 time: 50,923,790 ns. , fired: 39063 Thread count: 2, job size: 10 time: 120,843,850 ns. , fired: 39063 Thread count: 4, job size: 10 time: 114,703,964 ns. , fired: 39063 Thread count: 1, job size: 100 time: 23,312,460 ns. , fired: 39063 Thread count: 2, job size: 100 time: 18,617,171 ns. , fired: 39063 Thread count: 4, job size: 100 time: 19,075,330 ns. , fired: 39063 Thread count: 1, job size: 1000 time: 18,001,171 ns. , fired: 39063 Thread count: 2, job size: 1000 time: 12,098,744 ns. , fired: 39063 Thread count: 4, job size: 1000 time: 11,610,693 ns. , fired: 39063 Thread count: 1, job size: 10000 time: 17,060,828 ns. , fired: 39063 Thread count: 2, job size: 10000 time: 11,503,976 ns. , fired: 39063 Thread count: 4, job size: 10000 time: 12,311,620 ns. , fired: 39063 |
The code
A few changes to the original code
- Use final static fields rather than mutable static field. Ideally don't use static fields if a local field will do.
- Use the concurrent safe AtomicInteger or AtomicLong for shared counters. Without the later, there is a race condition where the same number would be seen by more than one thread.
- Use the built in libraries to manage thread pools.
- Use System.nanoTime() for benchmarks, it is about 16,000x more accurate on Windows.
import java.text.DecimalFormat; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.atomic.AtomicInteger; public class Trouble { public static final AtomicInteger JOB_NUMBER = new AtomicInteger(1); public static final AtomicInteger FIRED_COUNT = new AtomicInteger(); public static final int totalSize = 10000000; public static final int maxLoops = 3; public static void main(String[] args) throws ExecutionException, InterruptedException { for (int loops = 0; loops < maxLoops; loops++) { for (int size : new int[]{1, 10, 100, 1000, 10000}) { doTest(1, size); doTest(2, size); doTest(4, size); } } } private static void doTest(final int threadCount, final int jobSize) throws InterruptedException, ExecutionException { System.out.println("Thread count: " + threadCount + ", job size: " + jobSize); ExecutorService service = Executors.newFixedThreadPool(threadCount); JOB_NUMBER.set(0); FIRED_COUNT.set(0); long start = System.nanoTime(); Future[] futures = new Future[threadCount]; for (int j = 0; j < threadCount; j++) futures[j] = service.submit(new Runnable() { public void run() { while (true) { int jobStart = JOB_NUMBER.getAndAdd(jobSize); if (jobStart >= totalSize) break; for (int i = 0; i < jobSize; i++) { int jobNumber = jobStart + i; if ((jobNumber & 0xff) == 0) doNothing(); } } } }); for (Future future : futures) future.get(); service.shutdown(); String time = new DecimalFormat("#,##0").format(System.nanoTime() - start); System.out.println("time: " + time + " ns. , fired: " + FIRED_COUNT); } public static void doNothing() { FIRED_COUNT.getAndIncrement(); } }
Add Comment