-
Notifications
You must be signed in to change notification settings - Fork 835
Add a new DAE pass with a fixed point analysis #8217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This replaces #8085 because so much has changed in the meantime. |
DAE can be slow because it performs several rounds of interleaved analysis and optimization. On top of this, the analysis it performs is not as precise as it could be because it never removes parameters from referenced functions and it cannot optimize unused parameters or results that are forwarded through recursive cycles. Start improving both the performance and the power of DAE by creating a new pass, called DAE2 for now. DAE2 performs a single parallel walk of the module to collect information with which it performs a fixed point analysis to find unused parameters, then does a single parallel walk of the module to optimize based on this analysis.
|
There is a lot of test code here - is it copied from the existing tests? If so, can we use another lit run line for it? If that isn't practical yet because of differences, I still wonder if we can reuse the existing tests somehow, even if the two passes have differing output in some cases (can use a different check prefix for those tests, maybe move them out to another file, etc.). On another topic, what is our overall plan here? Do you want to land this even before we know if we will be replacing the old pass? |
|
The tests here are all new. We could additionally run the existing DAE tests, but we wouldn't be able to remove many of the new tests that way. |
|
As far as the overall plan goes, I'd like to have this landed so we can consider making it more powerful iteratively. My hope is that if it does more work DAE currently does, then we'll start seeing better speedups. We could also use it immediately to replace the unprofitable bailout from DAE, but that's probably not worth it. |
| // forwarded on to other function calls. These forwarded parameters can still be | ||
| // optimized out as long as they are unused in the callees they are forwarded | ||
| // to. Since we perform a fixed point analysis, cycles of forwarded parameters | ||
| // can still be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth noting that "forwarded" is not just "literally the value is forwarded" but also transformations of the value.
| using Params = std::vector<Used::Element>; | ||
|
|
||
| // Function index and parameter index. | ||
| using ParamLoc = std::pair<Index, Index>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about FuncParamLoc, to match TypeParamLoc below?
| #define TIME(...) | ||
| #endif // TIME_DAE | ||
|
|
||
| // Find the root of the subtyping hierarchy for a given HeapType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Find the root of the subtyping hierarchy for a given HeapType. | |
| // Find the non-basic root of the subtyping hierarchy for a given HeapType. |
| // Unreferenced functions can be optimized separately from referenced | ||
| // functions with the same type. For unreferenced functions in that situation, | ||
| // this is the new type that should be applied before global type rewriting to | ||
| // prevent the function from getting the wrong optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this. If a function is unreferenced, we track that in referenced, and in that case, we can just apply all the stuff we find, without caring about syncing it up with functions with a similar type. Why do we need to set a particular "replacement type" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a GlobalTypeRewriter to rewrite optimized function types for indirect calls globally. GlobalTypeRewriter does not provide any way to skip updating the types of non-referenced functions, so we have to arrange for the GlobalTypeRewriter to rewrite the types of non-referenced functions to the desired signatures as well. When different functions with the same original type should be optimized to have different types, we need to ensure those functions have different types before global type rewriting, otherwise it would not be possible for them to have different types after global type rewriting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we have to arrange for the GlobalTypeRewriter to rewrite the types of non-referenced functions to the desired signatures as well
Given a set of functions of a type T whose address is not taken, can't we do this?
- Create a new type T' with the same signature as T. Set all the functions to have that new type.
- Not enter anything for T' in the GlobalTypeRewriter mapping, so it leaves it alone.
So, yes, we need to give the address-taken and non-address-taken functions a different type, but can't we just do that in a few lines of code, without storing it on FunctionInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially yes, this is exactly what we do, with the added complexity that we need to make sure the new type T' is actually different from any type in the GlobalTypeRewriter mapping. As an extra optimization we also try to reuse other existing types that will already be mapped to the correct signature to reduce the number of new types.
In general we need to install these new types after optimizing function bodies and before doing the type mapping, so we do it at the end of the parallel sub-pass that updates function bodies. Until then we stash the intended replacement types on the FunctionInfos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks, I'll see if I can follow those details in the code...
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more incremental comments so far.
| // and vice versa. | ||
| std::vector<Index> referencedFuncs; | ||
|
|
||
| // For each parameter in this function type, the list of parameters of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // For each parameter in this function type, the list of parameters of | |
| // For each parameter in these function types, the list of parameters of |
| // Further, without GC we cannot differentiate the types of unreferenced and | ||
| // referenced functions before global type rewriting, so we cannot optimize | ||
| // them separately. Do not constrain the optimization of unreferenced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Further, without GC we cannot differentiate the types of unreferenced and
// referenced functions before global type rewriting, so we cannot optimize
// them separately.When GC is disabled or not, can we optimize/rewrite the types of unreferenced functions "incorrectly" - just let them change - and then fix them up afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately doing that causes assertion failures and crashes when the GlobalTypeRewriter tries to update the types of LocalGets after updating the function type to have the wrong signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...
| optimizeReferencedFuncs(optimizeReferencedFuncs) {} | ||
|
|
||
| bool isFunctionParallel() override { return true; } | ||
| bool modifiesBinaryenIR() override { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we always put these at the top of the class
DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.
Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.