uv detector improvements - multi group dev dependencies, git source detection, transitive dev propagation#1631
Conversation
|
@theztefan please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Pull request overview
This PR enhances the experimental UvLockComponentDetector to better reflect uv.lock semantics by improving dev-dependency parsing/classification and adding support for git-sourced packages.
Changes:
- Parse all
[package.metadata.requires-dev]dependency groups (not justdev). - Classify dev-only dependencies using transitive reachability (dev closure minus prod closure).
- Detect
source = { git = "..." }entries and register them asGitComponent(plus new unit tests and updated docs).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/Microsoft.ComponentDetection.Detectors/uv/UvLockComponentDetector.cs |
Adds git URL parsing, dev transitive closure calculation, and registers GitComponent when source.git is present. |
src/Microsoft.ComponentDetection.Detectors/uv/UvLock.cs |
Parses source.git and aggregates dev deps across all requires-dev groups. |
src/Microsoft.ComponentDetection.Detectors/uv/UvSource.cs |
Extends the source model with a Git field. |
test/Microsoft.ComponentDetection.Detectors.Tests/UvLockDetectorTests.cs |
Adds tests for multi-group dev deps, transitive dev propagation, and git source detection. |
docs/detectors/uv.md |
Documents dev reachability behavior and git-sourced component registration. |
src/Microsoft.ComponentDetection.Detectors/uv/UvLockComponentDetector.cs
Outdated
Show resolved
Hide resolved
| var uri = new Uri(gitUrl); | ||
| var repoUrl = new Uri(uri.GetLeftPart(UriPartial.Path)); | ||
| var commitHash = uri.Fragment.TrimStart('#'); |
There was a problem hiding this comment.
ParseGitUrl assumes the git URL is always a valid absolute URI and always contains a non-empty fragment. If the URL is malformed or doesn’t include a commit hash fragment, new Uri(...) or GitComponent construction will throw, causing the detector to bail out for the entire uv.lock. Consider using Uri.TryCreate plus an explicit check for a non-empty commit hash (and falling back to logging + treating it as a non-git package or skipping just that package) to avoid a single bad entry breaking detection.
| var uri = new Uri(gitUrl); | |
| var repoUrl = new Uri(uri.GetLeftPart(UriPartial.Path)); | |
| var commitHash = uri.Fragment.TrimStart('#'); | |
| if (!Uri.TryCreate(gitUrl, UriKind.Absolute, out var uri)) | |
| { | |
| // Malformed git URL; signal failure to the caller. | |
| return (null, string.Empty); | |
| } | |
| var repoPart = uri.GetLeftPart(UriPartial.Path); | |
| if (!Uri.TryCreate(repoPart, UriKind.Absolute, out var repoUrl)) | |
| { | |
| // Unable to construct a valid repository URL; signal failure. | |
| return (null, string.Empty); | |
| } | |
| var commitHash = uri.Fragment.TrimStart('#'); | |
| // If there is no fragment or it is empty/whitespace, treat as "no commit hash". | |
| if (string.IsNullOrWhiteSpace(commitHash)) | |
| { | |
| return (repoUrl, string.Empty); | |
| } |
There was a problem hiding this comment.
I believe this is false and not a concern we should take care of at this level. The uv.lock files are machine generated and follow the strict specs which for git sources afaik always includes the commit hash. This means that this could potentially take effect in cases where we have malformed uv.lock but then it should fail because it's malformed 😄
…etector.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| foreach (var dep in pkg.Dependencies) | ||
| { | ||
| var depPkg = uvLock.Packages.FirstOrDefault(p => p.Name.Equals(dep.Name, StringComparison.OrdinalIgnoreCase)); | ||
| if (depPkg != null) | ||
| { | ||
| var depComponentWithVersion = new PipComponent(depPkg.Name, depPkg.Version); | ||
| singleFileComponentRecorder.RegisterUsage(new DetectedComponent(depComponentWithVersion), parentComponentId: pipComponent.Id); | ||
| TypedComponent depComponent; | ||
| if (depPkg.Source?.Git != null) | ||
| { | ||
| var (depRepoUrl, depCommitHash) = ParseGitUrl(depPkg.Source.Git); | ||
| depComponent = new GitComponent(depRepoUrl, depCommitHash); | ||
| } | ||
| else | ||
| { | ||
| depComponent = new PipComponent(depPkg.Name, depPkg.Version); | ||
| } | ||
| singleFileComponentRecorder.RegisterUsage(new DetectedComponent(depComponent), parentComponentId: component.Id, isDevelopmentDependency: isDev); |
There was a problem hiding this comment.
Dependency edge registration does a linear search for every dependency (uvLock.Packages.FirstOrDefault(...)). For larger uv.lock files this becomes O(packages * deps) and you already build a name→package lookup for the transitive-closure calculation. Consider building a single case-insensitive lookup dictionary once (e.g., at the start of OnFileFoundAsync) and reusing it both for transitive reachability and for resolving depPkg when adding edges.
| @@ -49,8 +83,8 @@ protected override Task OnFileFoundAsync(ProcessRequest processRequest, IDiction | |||
| var uvLock = UvLock.Parse(file.Stream); | |||
|
|
|||
| var rootPackage = uvLock.Packages.FirstOrDefault(IsRootPackage); | |||
| var explicitPackages = new HashSet<string>(); | |||
| var devPackages = new HashSet<string>(); | |||
| var explicitPackages = new HashSet<string>(StringComparer.OrdinalIgnoreCase); | |||
| var devRootNames = new HashSet<string>(StringComparer.OrdinalIgnoreCase); | |||
|
|
|||
| if (rootPackage != null) | |||
| { | |||
| @@ -61,30 +95,57 @@ protected override Task OnFileFoundAsync(ProcessRequest processRequest, IDiction | |||
|
|
|||
| foreach (var devDep in rootPackage.MetadataRequiresDev) | |||
| { | |||
| devPackages.Add(devDep.Name); | |||
| devRootNames.Add(devDep.Name); | |||
| } | |||
| } | |||
|
|
|||
| // Compute dev-only packages via transitive reachability analysis. | |||
| // A package is dev-only if it is reachable from dev roots but NOT from production roots. | |||
| var prodRoots = rootPackage?.Dependencies.Select(d => d.Name) ?? []; | |||
| var prodTransitive = GetTransitivePackages(prodRoots, uvLock.Packages); | |||
| var devTransitive = GetTransitivePackages(devRootNames, uvLock.Packages); | |||
| var devOnlyPackages = new HashSet<string>(devTransitive.Except(prodTransitive), StringComparer.OrdinalIgnoreCase); | |||
|
|
|||
| foreach (var pkg in uvLock.Packages) | |||
| { | |||
| if (IsRootPackage(pkg)) | |||
| { | |||
| continue; | |||
| } | |||
|
|
|||
| var pipComponent = new PipComponent(pkg.Name, pkg.Version); | |||
| var isExplicit = explicitPackages.Contains(pkg.Name); | |||
| var isDev = devPackages.Contains(pkg.Name); | |||
| var detectedComponent = new DetectedComponent(pipComponent); | |||
| var isDev = devOnlyPackages.Contains(pkg.Name); | |||
|
|
|||
| TypedComponent component; | |||
| if (pkg.Source?.Git != null) | |||
| { | |||
| var (repoUrl, commitHash) = ParseGitUrl(pkg.Source.Git); | |||
| component = new GitComponent(repoUrl, commitHash); | |||
| } | |||
There was a problem hiding this comment.
ParseGitUrl/GitComponent creation can throw (e.g., invalid/relative URL, or missing #<commit> fragment). Because this happens inside the single try/catch around the whole file, one bad git URL will abort processing of the entire uv.lock and result in no components being recorded. Consider using Uri.TryCreate + validating a non-empty (and ideally hex/length-checked) commit hash, and if parsing fails log a warning and either skip just that package or fall back to a PipComponent instead of failing the whole detector run.
There was a problem hiding this comment.
My comment from above is valid here as well. Don't think we should be handling this here.
|
|
||
| // Compute dev-only packages via transitive reachability analysis. | ||
| // A package is dev-only if it is reachable from dev roots but NOT from production roots. | ||
| var prodRoots = rootPackage?.Dependencies.Select(d => d.Name) ?? []; |
There was a problem hiding this comment.
prodRoots is currently derived from rootPackage.Dependencies. However, in this codebase/tests the root package can omit the dependencies array while still declaring production dependencies in [package.metadata].requires-dist (which you already parse into explicitPackages). In that case prodRoots becomes empty, prodTransitive stays empty, and shared transitive dependencies can be incorrectly classified as dev-only. Consider using explicitPackages (or rootPackage.MetadataRequiresDist.Select(d => d.Name)) as the production root set for the reachability analysis instead of rootPackage.Dependencies.
| var prodRoots = rootPackage?.Dependencies.Select(d => d.Name) ?? []; | |
| var prodRoots = explicitPackages; |
This pull request enhances the
uvdetector to address the following limitations identified:Only the
devgroup under[package.metadata.requires-dev]was parsed.uvsupports arbitrary dependency groups and packages in non-devgroups. These were were silently dropped.Transitive dependencies of dev-only packages were incorrectly classified as production.
Git-sourced packages were ignore ignored, causing a silent failure to detect them.
Parse all dev dependency groups (Bug fix)
The TOML parser previously hardcoded
TryGetValue("dev", ...)to read only therequires-devtable. Now we iterate thru all keys in therequires-devtable collecting dependencies from every group.Transitive dev dependency propagation (feature)
BFS that computes the full transitive closure from a set of root package names. Dev classification now uses reachability analysis:
prodTransitive= all packages reachable from production rootsdevTransitive= all packages reachable from dev rootsdevOnly = devTransitive.Except(prodTransitive)Git source detection (feature)
Packages with
source = { git = "..." }are now registered asGitComponentinstead ofPipComponent. AddedParseGitUrl()to extract the repository URL and commit hash from the URI fragment and registered it.SupportedComponentTypesupdated to include the newComponentType.Git.And generated tests to cover the new additions.