Performance Inspector uses callstack sampling in two different modes: either driven by platform independent JVM events, such as memory allocations or monitor events, or driven by events generated by a platform specific device driver.
The platform independent mode is very useful for identifying the context in which events occur. Although the overhead of getting the callstack information varies by the depth of the callstack, the impact on the execution of the application is usually minimal because the events do not occur often and only occur on one thread at a time.
However, when callstack sampling is used to replace TPROF, using events generated by the device driver, complications arise when running on multi-processor systems. Unlike TPROF, which only needs to get the current execution address on each processor, a series of spin loops is required to prevent one processor from making forward progress while the JVM gets the callstack information on another processor. So far, the code to handle this has only been implemented on Windows, but Linux support should be available soon.