Cluster Analysis - Evaluating Highest Average Results
While a single WFA may provide a preliminary indication of whether a strategy is robust, the Cluster Analysis feature of WFO is generally a better method for proving or disproving the validity of a trading strategy and optimization procedure. As a bonus, the Cluster Analysis matrix helps determine how frequently the strategy should be re-optimized for optimal performance on unseen data.
Using cluster analysis to determine the optimal re-optimization trading window
During the cluster analysis process, the total equity for each completed walk-forward analysis is written to a grid on the Cluster Analysis tab.
Warning: Since WFO fully utilizes multiple cores of your PC, the cluster analysis can take up substantial computing resources. Ideally, it should be run overnight when you are not busy trading in real-time.
To abort the optimization, press the ‘Abort’ button on the WFO toolbar.
When the optimization is finished, the Cluster Analysis tab will also display the coordinates, i.e. the Out-Of-Sample% (OOS%) and Number of Runs that produces reports that can be accessed from the Display drop-down list:
- Walk-Forward Overall Result - Displays the walk-forward optimization results for the cluster.
- Re-Optimized P&L - Compares the maximum continuously re-optimized P&L (annualized).
- Walk-forward Efficiency% - Shows the maximum walk-forward efficiency (robustness) index%.
- Consistency of Profits (% runs profitable) - Displays the consistency of profits across runs.
- Maximum Drawdown % (per equity graph) - Shows the lowest maximum drawdown %.
- Re-optimization interval (out-of-sample days) - Displays the optimal re-optimization interval for out-of-sample days.
- Re-optimization interval (out-of-sample bars) - Displays the optimal re-optimization interval for out-of-sample bars.
- In-sample days - The number of in-sample days for each test.
- In-sample bars - The number of in-sample bars for each test.
Displaying cluster analysis results
Cluster analysis results can be displayed in a tabular format or as a 3D graph.
The Display drop-down list lets you select the cluster analysis report that you want to view in the report table.
The next section describes the data layout for two different sets of cluster analysis tables.
The Type drop-down list lets you select from one of four result sets based on cluster analysis performed with different combinations of checked/unchecked settings for Prescribe # of Walk-Forward Runs and Anchored on the Setup Optimization Settings dialog. Once you've performed a cluster analysis with a particular combination of the settings, the results for that combination can be displayed in a cluster analysis report.
In the cluster analysis tables, the selected WFA is highlighted in blue and matches the selected WFA data set shown at the top of the WFO window. Click any cluster cell to select another WFA. In addition, the first four cluster reports display a contiguous group of nine colored cells that represent the best performing WFAs in the cluster. A green cell in the group indicates a WFA that passed the report criteria (the middle passing cell may be dark green). A red cell in the group indicates a WFA did not pass the report criteria.
The 3D View check box allows you to change from the default tabular results view to a 3D graph for the selected Display and Type.
Interpreting the cluster analysis tables
EXAMPLE 1
Walk-forward Overall result (Pass/Fail)
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | FAILED | PASS | PASS | PASS | PASS | PASS |
15 | PASS | PASS | PASS | PASS | PASS | PASS |
20 | PASS | PASS | PASS | PASS | PASS | PASS |
25 | PASS | PASS | PASS | PASS | PASS | PASS |
30 | PASS | PASS | PASS | PASS | PASS | PASS |
Re-optimized P&L (annualized)
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | 17782.91 | 16740.29 | 12553.13 | 9700.53 | 9396.77 | 9281.76 |
15 | 16565.13 | 9624.54 | 10230.31 | 7777.74 | 7351.62 | 6485.74 |
20 | 11310.33 | 8036.32 | 9282.90 | 6389.17 | 6581.01 | 7444.95 |
25 | 10537.80 | 8624.54 | 3847.79 | 6818.86 | 6939.32 | 8815.25 |
30 | 7367.65 | 7496.13 | 6643.93 | 3428.18 | 6402.49 | 2579.89 |
Walk-forward Efficiency%
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | 52.9% | 58.4% | 70.0% | 70.3% | 62.3% | 66.3% |
15 | 53.7% | 81.7% | 70.1% | 69.8% | 64.3% | 66.0% |
20 | 66.7% | 70.0% | 65.3% | 64.8% | 63.0% | 61.0% |
25 | 72.3% | 67.0% | 66.5% | 59.0% | 66.4% | 66.6% |
30 | 75.7% | 66.0% | 61.0% | 65.8% | 57.7% | 50.8% |
Re-optimization interval (days)
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | 39 | 29 | 23 | 19 | 16 | 14 |
15 | 51 | 35 | 26 | 21 | 18 | 15 |
20 | 61 | 39 | 29 | 23 | 19 | 16 |
25 | 68 | 42 | 30 | 24 | 20 | 16 |
30 | 75 | 44 | 32 | 25 | 20 | 17 |
The four tables represent (in the order listed above):
- best Walk-forward Overall result (Pass/Fail)
- maximum continuously Re-optimized P&L (annualized)
- maximum Walk-forward Efficiency%
- optimal Re-optimization interval (days)
for the various walk-forward run and out-of-sample tests performed.
First of all we will look at the 1st, 2nd, and 3rd table to determine the cluster which produced the best results. A cluster is defined as a specific x,y coordinate (e.g. Walk-forward Runs=10, Out-Of-Sample%=OOS%=20) and the 8 neighbors immediately surrounding that coordinate.
You will see that WFO automatically calculates and display for you the optimal coordinate on the Cluster Analysis tab.
For this example, one can easily see that the max. re-optimized P&L is produced in the cluster with center x=10 runs, y=OOS%=15.
However, when looking at table #1 we see that one of the tests failed the walk-forward analysis, thus we will rather look for the best cluster (with the highest average) in terms of re-optimized P&L where all the neighbors also passes the WFA. The cluster with center x=15 runs and y=OOS%=15 appears to be the best.
This coordinate is indeed also surrounded with very healthy walk-forward efficiency percentages (table#3). Remember that the walk-forward efficiency is simply the annualized rates of return for the out-of-sample results divided by the in-sample results.
Table #4 then simply tells you what is the re-optimization interval in days for a given coordinate. In our case, we have finally selected x=10,y=20, thus the optimal re-optimization period is 39 days. Note that across the whole cluster, using any re-optimization period between 26-68 days (with the corresponding number of walk-forward runs) would still have resulted in good results, thus we have indeed a result that is robust.
If you will be using the strategy in real-time using an approach of constant re-optimization, you will simply re-optimize your strategy every 39 days. Even if you do not use an approach of constant re-optimization, then the tables are still very useful because they provide additional prove that your strategy is robust, by generating profits over a wide range walk-forward run and out-of-sample percentage combinations.
EXAMPLE 2
Walk-forward Overall result (Pass/Fail)
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | PASS | PASS | PASS | PASS | PASS | PASS |
15 | PASS** | PASS | PASS | PASS | PASS | PASS |
20 | PASS | PASS | PASS | PASS | PASS | PASS |
25 | PASS | PASS | PASS | PASS | PASS | PASS |
30 | PASS | PASS | PASS | PASS | PASS | PASS |
Re-optimized P&L (annualized)
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | 19291.11 | 18709.31 | 13809.98 | 17066.38 | 15482.08 | 14174.17 |
15 | 16341.30 | 14297.19 | 13152.52 | 12187.08 | 14114.28 | 13606.57 |
20 | 14208.05 | 15463.51 | 11390.87 | 11544.66 | 14597.98 | 13575.22 |
25 | 11135.77 | 12054.39 | 15473.71 | 13392.82 | 11878.76 | 15208.33 |
30 | 11482.46 | 11563.75 | 13434.59 | 10564.72 | 11821.49 | 12805.42 |
Walk-forward Efficiency%
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | 96.3% | 99.2% | 94.4% | 82.1% | 73.4% | 91.9% |
15 | 101.8% | 90.3% | 81.1% | 94.1% | 83.2% | 74.2% |
20 | 92.5% | 91.2% | 87.8% | 88.4% | 78.9% | 77.2% |
25 | 97.3% | 95.1% | 88.1% | 86.0% | 93.0% | 83.2% |
30 | 86.0% | 80.1% | 89.8% | 82.3% | 74.6% | 74.1% |
Re-optimization interval (days)
OOS% | Runs | |||||
5 | 10 | 15 | 20 | 25 | 30 | |
10 | 32 | 24 | 19 | 15 | 13 | 12 |
15 | 42 | 28 | 22 | 17 | 15 | 13 |
20 | 50 | 32 | 24 | 19 | 15 | 13 |
25 | 56 | 34 | 25 | 20 | 16 | 14 |
30 | 61 | 36 | 26 | 20 | 16 | 14 |
For this example, the max. re-optimized P&L (table#1) is produced in the cluster with center x=10 walk-forward runs, y=OOS%=15.
When looking at table #2 we see that one of the immediate neighbors even passed with distinction, and all the surrounding neighbors have very healthy walk-forward efficiency percentages (table#3).
Thus we will look no further and stick to this coordinate.
VERY IMPORTANT A strategy passes the cluster analysis when the Highest Cluster Average on the Walk-Forward Overall Results display is PASS, which indicates that the majority of the top nine cells have each passed the individual walk-forward analysis test criteria you specified.
Table #4 tells us the corresponding re-optimization period for coordinate (10,15) is 28 days. However, using any re-optimization period between 19-50 days (using the corresponding number of walk-forward runs) would also have resulted in good results, thus we have selected a result that is robust.
If you will be using the strategy in real-time using an approach of constant re-optimization, you would be re-optimizing this strategy every 28 days.
Reloading a previous cluster analysis
As soon as a Cluster Analysis has been completed, WFO will automatically save all the underlying reports and re-optimized equity curves for each cell in the CA matrix.
You can close down WFO
and later re-load a previously performed Cluster Analysis as follows:
-
Use the File > Load Cluster Analysis menu sequence.
-
Select a Walk-Forward test from the list-box on the dialog that contains the cluster analysis that you want to re-load.
-
Click Load to get the data, other Cancel to exit.
Now you may either use the drop-down box at the top of the WFO window (above the tabs) to select a report or you may simply click on any individual cell of the Cluster Analysis matrix and WFO will automatically re-load all the underlying reports for that specific Walk-Forward Analysis.
Cluster Analysis results performed for each checked/unchecked setting of Prescribe # of walk-forward runs and Anchored on the Optimization Settings dialog are saved as separate data sets that can be accessed using the Type drop-down on the Cluster Analysis tab of the WFO.
How much data should I use when doing the re-optimization?
The amount of data that you will be using when doing the re-optimization will depend on whether you have selected Anchored or rolling window (not anchored) walk-forward optimization.
If you have not selected anchored (thus rolling) walk-forward then you will, for instance, re-optimize your strategy every 28 days (the optimal period for example 2 above) with TradeStation genetic optimization, using only the last x day's data, where x is the number of days in the immediately preceding in-sample period. (Note that you must load sufficient data for TS to satisfy its max. bars look back setting. Thus if the max. bars TS may reference is 50, then you need to load load x+50 days history).
If you have selected anchored walk-forward then you will re-optimize your strategy every 28 days with genetic optimization, using ALL the data starting at exactly the same point in time as what you have used for doing the cluster analysis.
Evaluating individual walk-forward test results
By clicking on individual cells in the Cluster Analysis matrix, you can automatically load all the underlying walk-forward reports.
For example, if you click on the cell that represents the coordinate Runs=15, 05%=20 then WFO will automatically load all the corresponding in-sample and out-of-sample reports. These reports can be viewed on the Optimization Summary (in-sample) and Walk-Forward (out-of-sample) tabs while the Test Results tab will display the overall result for this specific in/outs sample permutation. The Graphs tab shows the re-optimized equity curve. This is the equity curve if the strategy was continuously re-optimized and then traded forward on unseen data (walk-forward runs) using the parameters that were suggested by the WFO at the time. The equity curve graph gives you a realistic picture of the kind of performance you might expect from your strategy when used with unseen data.