Next: References
Up: No Title
Previous: Experimental Results
In this project, performance of dynamic branch prediction techniques based on pattern history table (PHT) were compared by using some benchmarks. Also some new techniques were proposed to incerase the prediction accuracy. Four basic prediction schemes, branch address, global history, concatenation and xoring which are used for indexing the pattern history tables, were implemented by using ATOM hardware simulator in order to understand their prediction performance and their behaviors for different kind of conditional branches. Experiments indicated that xoring scheme had the best prediction accuracy for all benchmark and for different size of PHTs among the others techniques since it used more information about branch behavior than the others. Branch address scheme could not use the counters in PHT properly, 75% of counters were not used in prediction. There is an important problem, aliasing, in prediction because of limitation on the number of counters in PHT, and it has very bad effect on the prediction accuracy of these schemes. When a counter is used by more than one different pattern, aliasing problem occurs. Experiments shows that branch address schemes has less aliasing problem than the others and theoraticaly it could be ture that the prediction performance could be increased by large amounts by removing the aliasing problem from predictions. However, it is not easy to remove aliasing from prediction. One way which can be used to decrease the affect of aliasing is to combine the individual prediction schemes. Three different combined prediction schemes were proposed and experimentally tested in this work.
In combined prediction schemes, the advantage of having less aliasing of branch address and the advantage of having more prediction accuracy of xoring were used and third counter table was used to select the more correct one of two counter for prediction instance. Experiments showed that combined schemes have more higher prediction accuracy than each individual schemes. Here, there is a trade off using combined schemes or individual schemes, because combined schemes uses 3 times more counters than individual one's and these schemes are implemented in hardware. Therefore, we have to use a cost function for selection which schemes which would be implemented in processors. I we use a linear cost function between number of counters and misprediction rate, we could draw a cost graph as in Fig.13.
Figure 13: Prediction performance for different size of PHT for xlisp benchmark
In figure 13, you can see that if number of table is bigger than 3000, using combined scheme is more efficient than xoring scheme. However, if the number of counters is smaller than 3000, than using xoring scheme is good selection for performance.
Next: References
Up: No Title
Previous: Experimental Results