Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented Jan 17, 2026

DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.

Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.

@tlively
Copy link
Member Author

tlively commented Jan 17, 2026

This replaces #8085 because so much has changed in the meantime.

DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.

Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.
@kripken
Copy link
Member

kripken commented Jan 20, 2026

There is a lot of test code here - is it copied from the existing tests? If so, can we use another lit run line for it? If that isn't practical yet because of differences, I still wonder if we can reuse the existing tests somehow, even if the two passes have differing output in some cases (can use a different check prefix for those tests, maybe move them out to another file, etc.).

On another topic, what is our overall plan here? Do you want to land this even before we know if we will be replacing the old pass?

@tlively
Copy link
Member Author

tlively commented Jan 20, 2026

The tests here are all new. We could additionally run the existing DAE tests, but we wouldn't be able to remove many of the new tests that way.

@tlively
Copy link
Member Author

tlively commented Jan 20, 2026

As far as the overall plan goes, I'd like to have this landed so we can consider making it more powerful iteratively. My hope is that if it does more work DAE currently does, then we'll start seeing better speedups. We could also use it immediately to replace the unprofitable bailout from DAE, but that's probably not worth it.

// forwarded on to other function calls. These forwarded parameters can still be
// optimized out as long as they are unused in the callees they are forwarded
// to. Since we perform a fixed point analysis, cycles of forwarded parameters
// can still be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth noting that "forwarded" is not just "literally the value is forwarded" but also transformations of the value.

using Params = std::vector<Used::Element>;

// Function index and parameter index.
using ParamLoc = std::pair<Index, Index>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about FuncParamLoc, to match TypeParamLoc below?

#define TIME(...)
#endif // TIME_DAE

// Find the root of the subtyping hierarchy for a given HeapType.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Find the root of the subtyping hierarchy for a given HeapType.
// Find the non-basic root of the subtyping hierarchy for a given HeapType.

// Unreferenced functions can be optimized separately from referenced
// functions with the same type. For unreferenced functions in that situation,
// this is the new type that should be applied before global type rewriting to
// prevent the function from getting the wrong optimizations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this. If a function is unreferenced, we track that in referenced, and in that case, we can just apply all the stuff we find, without caring about syncing it up with functions with a similar type. Why do we need to set a particular "replacement type" here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use a GlobalTypeRewriter to rewrite optimized function types for indirect calls globally. GlobalTypeRewriter does not provide any way to skip updating the types of non-referenced functions, so we have to arrange for the GlobalTypeRewriter to rewrite the types of non-referenced functions to the desired signatures as well. When different functions with the same original type should be optimized to have different types, we need to ensure those functions have different types before global type rewriting, otherwise it would not be possible for them to have different types after global type rewriting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we have to arrange for the GlobalTypeRewriter to rewrite the types of non-referenced functions to the desired signatures as well

Given a set of functions of a type T whose address is not taken, can't we do this?

  1. Create a new type T' with the same signature as T. Set all the functions to have that new type.
  2. Not enter anything for T' in the GlobalTypeRewriter mapping, so it leaves it alone.

So, yes, we need to give the address-taken and non-address-taken functions a different type, but can't we just do that in a few lines of code, without storing it on FunctionInfo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially yes, this is exactly what we do, with the added complexity that we need to make sure the new type T' is actually different from any type in the GlobalTypeRewriter mapping. As an extra optimization we also try to reuse other existing types that will already be mapped to the correct signature to reduce the number of new types.

In general we need to install these new types after optimizing function bodies and before doing the type mapping, so we do it at the end of the parallel sub-pass that updates function bodies. Until then we stash the intended replacement types on the FunctionInfos.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, I'll see if I can follow those details in the code...

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more incremental comments so far.

// and vice versa.
std::vector<Index> referencedFuncs;

// For each parameter in this function type, the list of parameters of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// For each parameter in this function type, the list of parameters of
// For each parameter in these function types, the list of parameters of

Comment on lines +208 to +210
// Further, without GC we cannot differentiate the types of unreferenced and
// referenced functions before global type rewriting, so we cannot optimize
// them separately. Do not constrain the optimization of unreferenced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  // Further, without GC we cannot differentiate the types of unreferenced and
  // referenced functions before global type rewriting, so we cannot optimize
  // them separately.

When GC is disabled or not, can we optimize/rewrite the types of unreferenced functions "incorrectly" - just let them change - and then fix them up afterwards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately doing that causes assertion failures and crashes when the GlobalTypeRewriter tries to update the types of LocalGets after updating the function type to have the wrong signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...

optimizeReferencedFuncs(optimizeReferencedFuncs) {}

bool isFunctionParallel() override { return true; }
bool modifiesBinaryenIR() override { return false; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention we always put these at the top of the class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants