I don't understand why a handmade linked list/ring was used here, with manual reference count. Don't we have std::list and std::shared_ptr for this kind of job? Can some one explain to me the rationale behind that?
I change the linked list to std::list, and the reference counting logic to std::shared_ptr, it passed the test cases. Because of the std::list overhead, now per emission size is 24 bytes compare to 8 bytes, I suppose when more slots are connected, the difference will be smaller. On my Mac, the peak memory used by the master implementation and my version are almost the same, 892928 vs 901120.
But surprisingly, the latency of my version is much smaller, on my 2013 Mac pro, with 3.5 GHz 6-Core Intel Xeon E5, the master implementation took 22.417022ns per emission. My version only took 4.695005ns.
> time -l ./test.master
Signal/Basic Tests: OK
Signal/CollectorVector: OK
Signal/CollectorUntil0: OK
Signal/CollectorWhile0: OK
Signal/Benchmark: Simple::Signal: OK
Benchmark: Simple::Signal: 22.417022ns per emission (size=8): OK
Signal/Benchmark: callback loop: OK
Benchmark: callback loop: 0.014000ns per round: OK
0.02 real 0.02 user 0.00 sys
892928 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
221 page reclaims
9 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
1 voluntary context switches
8 involuntary context switches
> time -l ./test.list
Signal/Basic Tests: OK
Signal/CollectorVector: OK
Signal/CollectorUntil0: OK
Signal/CollectorWhile0: OK
Signal/Benchmark: Simple::Signal: OK
Benchmark: Simple::Signal: 4.695005ns per emission (size=24): OK
Signal/Benchmark: callback loop: OK
Benchmark: callback loop: 0.001000ns per round: OK
0.01 real 0.00 user 0.00 sys
901120 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
221 page reclaims
11 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
1 voluntary context switches
5 involuntary context switches