Profiling the Mercury Parallel Runtime.
There have been some performance issues with parallel Mercury programs. We understand the worst of these, but there are those that we don't. Sometimes it's also useful to measure a behavior and be sure that you understand it rather that guessing.
Modern computers execute so quickly that using a system call such as gettimeofday(2) is not satisfactory for two reasons. Firstly gettimeofday(2) doesn't return very high-precision information so it's hard to profile events that are very quick. Secondly because this is a system call it involves a transition into kernelspace and back, which can often be to expensive compared with the event you're trying to measure, it can distort the profiling data.
The solution is to use the CPU's Time Stamp Counter (TSC) (http://en.wikipedia.org/wiki/Time_Stamp_Counter). Access to this value is via a special CPU instruction (RDTSC) that writes the time stamp counter into a couple of the general purpose registers, this instruction has been supported for a while, roughly from the Pentium Classic. However this is still problematic, if between taking timestamps the thread is migrated to a different CPU in the same system it will not be able to compare the TSC values from different CPUs in order to compute a duration since the TSCs in different CPUs may not be synchronized, and on some systems they may not move at the same rate.
Enter the (RDTSCP) instruction, which appears to be available since the Intel i7. This instruction reads the TSC and the Processor ID, allowing the caller to compare the processor IDs from to RDTSCP instructions to determine if it's valid to compare the TSC values. Unfortunately we don't currently have access to any Intel i7s or sufficiently recent AMD CPUs. We support the RDTSCP instruction but work-around this issue by pinning threads to particular processors (see sched_setaffinity(2) and use sysconf(_UC_NPROCESSORS_ONLN) to detect the number of CPUs/Cores/Hardware threads available.)
Stay tuned... I will discuss how to detect support for these features at runtime.
- paul's blog
- Login or register to post comments