`ADR suggestion` allow users to query objects by `name` which is not globally unique #194

henrikjacobsenfys · 2026-02-06T10:02:08Z

henrikjacobsenfys
Feb 6, 2026
Maintainer

Just to have a place for this discussion. I can think of many examples, but here's just one where indexing by unique_names is annoying, and where I don't see an easy workaround.

User defines a model and tries to fit it, in somewhat pseudocode:

# Loading data etc happens in a different cell
...
model=ComponentCollection(unique_name='my_model')
gaussian=Gaussian(unique_name='my_gaussian')
lorentzian=Lorentzian(unique_name='my_lorentzian')
model.append_component(gaussian)
model.append_component(lorentzian)

model['my_gaussian'].area.fixed=True
model['my_lorentzian'].center=0.2
analysis=Analysis(data=data,model=model)
analysis.fit()
analysis.plot_data_and_fit()

The user then sees the plot and realises that it actually looks like it would fit better with a second Gaussian instead of a Lorentzian. So they edit their cell:

...
model=ComponentCollection(unique_name='my_model')
gaussian=Gaussian(unique_name='my_gaussian')
gaussian2=Gaussian(unique_name='my_gaussian2')
model.append_component(gaussian)
model.append_component(gaussian2)

model['my_gaussian'].area.fixed=True
model['my_gaussian2'].center=0.2
analysis=Analysis(data=data,model=model)
analysis.fit()
analysis.plot_data_and_fit()

And the user gets an error. This type of behavior is extremely common in analysis workflows, as in, I expect more than 90% of my users to do something like this.

They will either have to restart the notebook or change the unique_name and all references to it, which is not Easy^TM.

They could also create a new cell with something like this, and then run the analysis afterwards

model.remove_component('my_lorentzian')
model.append_component(gaussian2)

This is terrible for the next day when they do in fact restart their notebook and now add and remove components at random. Their code would look like this:

...
model=ComponentCollection(unique_name='my_model')
gaussian=Gaussian(unique_name='my_gaussian')
lorentzian=Lorentzian(unique_name='my_lorentzian')
model.append_component(gaussian)
model.append_component(lorentzian)

model.remove_component('my_lorentzian')
gaussian2=Gaussian(unique_name='my_gaussian2')
model.append_component(gaussian2)

model['my_gaussian'].area.fixed=True
model['my_gaussian2'].center=0.2
analysis=Analysis(data=data,model=model)
analysis.fit()
analysis.plot_data_and_fit()

This encourages really messy notebooks, which I do not want to do.

Note that the above step, where they realise that their model isn't quite right, is likely to be taken many times. It's not a simple one-off to define the right model that fits your data, unless your data is particularly simple.

In general, it's common to have "setting up my model" in a single cell. If you're never allowed to rerun a cell that defines your model, you need to separate different parts of the "setting up my model" artifically into smaller steps, and remember to re-run the exact ones that you change. This may be good coding practice, but it is not how neutron scatterers work.
I think it's extremely ambitious and non-Easy^TM to set out to re-educate our userbase on good coding practice. For half of the spectroscopy users, switching from Matlab to Python is already a Big Deal, and if they get random errors when running their code, they will not make the switch. Or they will do it grudgingly like with certain other well known neutron scattering software packages, and complain a lot.

damskii9992 · 2026-02-06T11:39:39Z

damskii9992
Feb 6, 2026
Maintainer

Please remember that the notebooks/library are only meant for advanced users, i.e. users which should know not to re-run certain cells. Beginner users are meant to use the app.

Another important point is that they will encounter this exact issue when reducing their data too, as not all cells can be rerun in the reduction workflows either.

At the imaging beamline VENUS at SNS, their instrument data scientist encountered exactly the problems you mentioned when he made jupyter notebooks for his users. The notebooks would be messed up by dumb users running notebooks in arbitrary order. He has had great success using these "can only be run once" cells, and providing a clear ruleset and guidelines for using Jupyter notebooks at the start of his every notebook, including color-coding his cells.

The thing is, users will mess up your notebooks, even if we don't use unique_names, users will just mess them up by running cells out of order in a different fashion. Like running a cell that removes a ModelComponent after just adding their new component and then wondering why their fit doesn't work. Having a more general "don't run these cells again" guideline is much more transferable to other such issues.

1 reply

henrikjacobsenfys Feb 6, 2026
Maintainer Author

First, data reduction is an entirely different matter. You don't mess with reduction. Essentially no user will be an expert, and essentially no user will want to fiddle around with details. They just want to click the big reduce button, wait for the magic to happen, and move on. I suspect most users will want to restart the reduction if changes are made somewhere anyway, to be certain everything is run correctly.

There are two types of advanced analysis users. The ones that know what they're doing in terms of neutron scattering, and have complicated data that they cannot fit in the app (or they want to script their fits), and the ones who are advanced programmers. I don't worry about the latter group. But as a (former?) member of the first group, I know exactly how they/we think. Furthermore, every spectroscopy instrument scientist at ESS, and, I believe, most of the users, belong to the same group.

Rerunning your analysis many times, tweaking parameters and fit functions, is part of a normal workflow for these users, at least in spectroscopy. Having that same cell give errors when you rerun it, will turn people away from our software. Requiring users to either restart all the time, or develop good coding practice on the spot, will turn people away from using our software. Worse, most of them will not even think to complain to us about it. They'll complain among themselves (how many times have you heard people complain about Mantid? How many times have you heard people say that they made an issue on the Mantid Github, hoping to resolve the particular problem they had?).

I do not see a way to color-code the example given above to give the user a nice experience (if you do, please enlighten me :) ). I also do not see a reason (from the user perspective) why they should not be allowed the change a function from a Lorentzian to a Gaussian and re-run their fit cell without reloading their data.

That said, I do think color-coding sounds like a very useful tool to help guide users.

AndrewSazonov · 2026-02-06T20:38:29Z

AndrewSazonov
Feb 6, 2026
Maintainer

I understand your concerns, Henrik, about how difficult these kinds of examples can be for users who are not software developers. And I agree that in many such cases users will run into frustrating problems during their normal workflow. I also agree with Christian’s point about real user experience at SNS: users will find ways to mess up notebooks anyway.

The best UX for many users would be to use the GUI. But I’m afraid we will never be able to make the GUI as flexible as the library in terms of functionality. So for experienced users, the library will still be needed. And by experienced users, I mean experienced in science, not in programming. These users can have complex scientific tasks that can only realistically be done through the library. Because of this, I believe we should try to make notebook workflows as simple and forgiving as possible.

I don’t know if the approach used at VENUS at SNS works well for their users in practice. But if it does, then I think we should take a closer look at it and try to adapt it where possible.

Now, regarding your original example.

1. About unique_name

Maybe this is not the best example for demonstrating the need for unique_name, because I don’t really see why it is needed here at all. You create an object model with unique_name='my_model', but you never access it by name - you access it directly via the variable model.

Why not do the same for the components gaussian and gaussian2? Why do they need unique names? What is the advantage?

Right now, this makes the code more complex:

You choose a variable name (gaussian2)
You also choose a different unique name ('my_gaussian2')

These two names are not even the same, which doubles complexity. One could make them identical, but then what is the point of duplicating them at all?

If we remove explicit unique names, the code becomes simpler and more natural:

model = ComponentCollection()
gaussian = Gaussian()
gaussian2 = Gaussian()
model.append_component(gaussian)
model.append_component(gaussian2)

gaussian.area.fixed = True
gaussian2.center = 0.2
analysis = Analysis(data=data, model=model)
analysis.fit()
analysis.plot_data_and_fit()

In this case, unique names can be auto-generated internally, and there are no conflicts.

2. Names should be scoped, not globally unique

If users really need to access components by name, then we should have a local name attribute, not a globally unique one. This is exactly the approach I currently use in the new EasyDiffraction library.

I do not plan to allow users to assign global unique names, because this easily leads to the kinds of problems you described - and others as well. A name should be unique only within a certain scope, not everywhere.

For example, we should be able to do this:

first_model['my_gaussian'].area.fixed = True
second_model['my_gaussian'].area.fixed = False

Here, both 'my_gaussian' objects are completely independent, living in different models.

In diffraction, this is especially important. For example, when working with multiple sample models:

first_model.atom_sites['Si'].fract_x = 0.5
second_model.atom_sites['Si'].fract_x = 0.0

I am not going to ask users to do something like:

first_model.atom_sites['first_unique_Si'].fract_x = 0.5
second_model.atom_sites['second_unique_Si'].fract_x = 0.0

That would be very unnatural from a scientific point of view.

3. Moving toward a higher-level API

The more I work on EasyDiffraction and collect user feedback - including what Christian mentioned about SNS notebooks and what Henrik expects users will struggle with - the more I believe in an API where most things can be done via top-level objects, without manually creating and managing many small objects.

This is quite different from traditional Python library design and standard developer practices, but if our users cannot use the library effectively, then all this work loses much of its value. Expecting to train all users to adopt good programming practices also feels unrealistic.

Applying my current EasyDiffraction approach to your example, the workflow could look like this:

model = ComponentCollection()
model.add_component(name='my_gaussian', type='gaussian')
model.add_component(name='my_gaussian2', type='gaussian')

model['my_gaussian'].area.fixed = True
model['my_gaussian2'].center = 0.2

Here, many details are hidden from the user:

Object creation
Tracking component names
Internal consistency checks

All name management happens inside the collection, not via external objects. This avoids conflicts, keeps the API clean, and prevents users from accidentally breaking internal state when rerunning cells.

In my view, this is much closer to how scientists expect such tools to behave.

Ideally, we would need to have two API styles: one following a more traditional library design for users with strong programming experience, and another one designed mainly for scientists. This is what I started to implement in EasyDiffraction. However, supporting two parallel APIs requires much more work and long-term maintenance. Because of that, I am seriously considering focusing only on the more straightforward, scientist-friendly API. I believe users with minimal programming skills will make up the majority of users at ESS and similar facilities, and for them simplicity and robustness are more important than flexibility or “pure” library design.

That said, this is only my view for the final, technique-specific product. For the core library, I think a more traditional library design makes more sense. The corelib will be hidden from end users by our top-level APIs. If someone decides to work with the corelib directly, they are expected to understand what they are doing and to have good enough programming skills.

3 replies

henrikjacobsenfys Feb 7, 2026
Maintainer Author

You made the case for name over unique_name for indexing much more eloquently and convincingly than I managed. Your questions about my example can all be answered by this: I tried to come up with an illustrative and simple example that shows why unique_name can be problematic, and clearly didn't do as good a job as I initially thought.

Your point 3 is very interesting. For now, I want to move forward with the current API, since I really want a first release asap, but I'm making a note to return to it later.

I will say that users, at least in spectroscopy, value flexibility quite highly. It took Christian Beck less than a week of working with the EasyDynamics prototype before he asked about adding extra parameters to make a more complex model than was natively supported. I have a quite good idea about what users want, but I obviously cannot predict every use case. Therefore, I want to make it easy for people to push EasyDynamics to its limits and beyond in ways I have not predicted. That'll be an interesting discussion when I make my analysis PR :)

AndrewSazonov Feb 7, 2026
Maintainer

Yeah, this really depends on the user community, and different communities can have very different expectations.

For example, in macromolecular crystallography, as Aaron mentioned several times, most users prefer to work only with GUIs and do not want to use notebooks or scripts at all. This is something we need to keep in mind.

I believe, the biggest gain in flexibility already happens when we move from a GUI to a library. After that, different library APIs are usually not so different in terms of what is possible. A user with good programming skills will anyway be able to adapt to almost any API.

However, users without strong programming skills are much more likely to get stuck when the API is more low level oriented.

damskii9992 Feb 9, 2026
Maintainer

I don't really see the issue with unique_names here. As you mentioned in point 1, the issue is altogether avoided by simply not using a custom unique_name here . . .
I am also not in favour of an API like this:

model = ComponentCollection()
model.add_component(name='my_gaussian', type='gaussian')
model.add_component(name='my_gaussian2', type='gaussian')

model['my_gaussian'].area.fixed = True
model['my_gaussian2'].center = 0.2

I don't believe this is any easier for non-programmer users to use than a more regular python API and if users have even a minimal iota of experience, or if they look up things online, this API will just be more confusing and annoying for them. At least that is my opinion.

rozyczko · 2026-02-09T09:40:57Z

rozyczko
Feb 9, 2026
Maintainer

The current unique_name is overloaded: a lookup key, a global identifier, a user-visib le label... That's too much.

I tend to agree with @AndrewSazonov's take on the local name but I'd go even further in terms of separation of concerns.

Users should not be forced to think about global namespaces when building models - usage of our unique_name should be internal only.

Human readable labels should be local to the scope of their creation.

Components should always have an internal, stable identity (implementation concern) - our unique_name.
User-facing name should be container-scoped, not global (scientist concern) - a modified version of the original name attribute.

model = Model()
model.add(Gaussian(name="peak"))
model.add(Lorentzian(name="background"))

# lookup is local to the model
model["peak"].sigma = 0.2

This is essentially what Andrew is also proposing.

Maybe we could think of having truly global references, but only when explicitly requested? You all seem to think this is requiring too much, but if we want to think about linking components between various models (like in creating the dependencies/constraints) or doing serialization (where users would get their predefined names on project read).

If we had a way of doing, say

g = Gaussian(name="peak")
g.assign_global_id("sample1.elastic_peak")

model1.add(g)

# Later, elsewhere:
registry["sample1.elastic_peak"].sigma = 0.15

with sample1.elastic_peak being assigned to the parameter, regardless of its unique_name, meaning the name would carry over on copy and serialization.

Adding the registry component would be pretty easy AND completely optional. This would be a power option for advanced users.

This means we have

unique_name - only for internal object count (mandatory attribute on Descriptor)
local_name - for scoped descriptor indexing (optional attribute on Descriptor)
global_id - for complex coordination across scopes (independent mechanism/dict tying global id and Descriptor instance)

0 replies

damskii9992 · 2026-02-09T10:00:51Z

damskii9992
Feb 9, 2026
Maintainer

User defines a model and tries to fit it, in somewhat pseudocode:
# Loading data etc happens in a different cell
...
model=ComponentCollection(unique_name='my_model')
gaussian=Gaussian(unique_name='my_gaussian')
lorentzian=Lorentzian(unique_name='my_lorentzian')
model.append_component(gaussian)
model.append_component(lorentzian)

model['my_gaussian'].area.fixed=True
model['my_lorentzian'].center=0.2
analysis=Analysis(data=data,model=model)
analysis.fit()
analysis.plot_data_and_fit()
The user then sees the plot and realises that it actually looks like it would fit better with a second Gaussian instead of a Lorentzian. So they edit their cell:
...
model=ComponentCollection(unique_name='my_model')
gaussian=Gaussian(unique_name='my_gaussian')
gaussian2=Gaussian(unique_name='my_gaussian2')
model.append_component(gaussian)
model.append_component(gaussian2)

model['my_gaussian'].area.fixed=True
model['my_gaussian2'].center=0.2
analysis=Analysis(data=data,model=model)
analysis.fit()
analysis.plot_data_and_fit()
And the user gets an error. This type of behavior is extremely common in analysis workflows, as in, I expect more than 90% of my users to do something like this.

I've looked at this again, and can I just ask, why even have a custom unique_name in this case?
The user already have the components in the gaussian, lorentzian and gaussian2 variables. A much better notebook, which would avoid this issue altogether, would look like this:

> model=ComponentCollection()
> gaussian=Gaussian()
> lorentzian=Lorentzian()
> model.append_component(gaussian)
> model.append_component(lorentzian)
> 
> gaussian.area.fixed=True
> lorentzian.center=0.2
> analysis=Analysis(data=data,model=model)
> analysis.fit()
> analysis.plot_data_and_fit()

And when the user then edits this cell, there will be no unique_name clashes:

model=ComponentCollection()
gaussian=Gaussian()
gaussian2=Gaussian()
model.append_component(gaussian)
model.append_component(gaussian2)

gaussian.area.fixed=True
gaussian2.center=0.2
analysis=Analysis(data=data,model=model)
analysis.fit()
analysis.plot_data_and_fit()

If you are making the model in the same notebook, the components are already in the namespace, so there is no reason to access them through the component_collection . . .

Setting custom unique_names is not meant to be standard procedure, its something you do when you know what you do and why you do it, like to make a dependent_parameter.

5 replies

henrikjacobsenfys Feb 9, 2026
Maintainer Author

It's because I'm bad at coming up with good examples on the fly. Andrew's example is a much better illustration.

henrikjacobsenfys Feb 9, 2026
Maintainer Author

But just to add another use case: I'm making extensive use of copy, and I'll want users to be able to refer to copies of components by their name, not only their index.

damskii9992 Feb 9, 2026
Maintainer

That is probably a much more valid scenario. I can see why unique_names wouldn't work there, especially since we don't copy unique_names.

You can still work with unique_names, if you use my earlier suggestion by making the name look-up a regex search and have your copy methods add a number to custom unique_names.

But otherwise you might have a fair enough reason to have another "name" attribute, but that doesn't necessarily mean it should go into corelib. Your "component_name" could be specific to EasyDynamics.

rozyczko Feb 9, 2026
Maintainer

Or, an optional attribute, as I mentioned, which is defined in corelib in order to facilitate its use in serialization and dependency tracking

damskii9992 Feb 9, 2026
Maintainer

Or, an optional attribute, as I mentioned, which is defined in corelib in order to facilitate its use in serialization and dependency tracking

It doesn't have to be in corelib to be in serialization. Any regular attribute automatically gets picked up and serialized/deserialized. So as long as he doesn't do anything crazy with the attribute, it would be serialized without issue :)
And we don't really want to use any other names than unique_names for dependencies . . . Unless you don't mean dependent_parameters when you say "dependency tracking"?

rozyczko · 2026-02-09T12:39:12Z

rozyczko
Feb 9, 2026
Maintainer

After discussing this with @damskii9992 we seem to agree that this should NOT be a base class implementation.

Some techniques (spectroscopy, diffraction) need the extra name-like attribute on Descriptor (name, pretty_name, display_name etc), and some don't (refl, imaging).

It therefore makes sense to allow derived Parameter classes in tech-specific libraries to define their own name-like attributes.
As Christian says: serialization should be pretty straightforward, since those attributes are strings.

This also leads to allowing the global_id() to be done on the library level.

If, at any time, it becomes obvious we all need this flexibility, we should reconsider placing those extra attributes (and associated logic) in the base class.

1 reply

henrikjacobsenfys Feb 9, 2026
Maintainer Author

Ok.

Sorry to mix discussions, but regarding EasyList: would it be a good idea to make it easy to inherit from it and extend it with a query to name? Otherwise I'll have to copy/paste it and duplicate the logic.

ADR suggestion allow users to query objects by name which is not globally unique #194

Uh oh!

Uh oh!

henrikjacobsenfys Feb 6, 2026 Maintainer

Replies: 5 comments · 10 replies

Uh oh!

damskii9992 Feb 6, 2026 Maintainer

Uh oh!

henrikjacobsenfys Feb 6, 2026 Maintainer Author

Uh oh!

AndrewSazonov Feb 6, 2026 Maintainer

1. About unique_name

2. Names should be scoped, not globally unique

3. Moving toward a higher-level API

Uh oh!

henrikjacobsenfys Feb 7, 2026 Maintainer Author

Uh oh!

AndrewSazonov Feb 7, 2026 Maintainer

Uh oh!

damskii9992 Feb 9, 2026 Maintainer

Uh oh!

rozyczko Feb 9, 2026 Maintainer

Uh oh!

Uh oh!

damskii9992 Feb 9, 2026 Maintainer

Uh oh!

henrikjacobsenfys Feb 9, 2026 Maintainer Author

Uh oh!

henrikjacobsenfys Feb 9, 2026 Maintainer Author

Uh oh!

damskii9992 Feb 9, 2026 Maintainer

Uh oh!

rozyczko Feb 9, 2026 Maintainer

Uh oh!

damskii9992 Feb 9, 2026 Maintainer

Uh oh!

rozyczko Feb 9, 2026 Maintainer

Uh oh!

Uh oh!

henrikjacobsenfys Feb 9, 2026 Maintainer Author

`ADR suggestion` allow users to query objects by `name` which is not globally unique #194

henrikjacobsenfys
Feb 6, 2026
Maintainer

Replies: 5 comments 10 replies

damskii9992
Feb 6, 2026
Maintainer

henrikjacobsenfys Feb 6, 2026
Maintainer Author

AndrewSazonov
Feb 6, 2026
Maintainer

henrikjacobsenfys Feb 7, 2026
Maintainer Author

AndrewSazonov Feb 7, 2026
Maintainer

damskii9992 Feb 9, 2026
Maintainer

rozyczko
Feb 9, 2026
Maintainer

damskii9992
Feb 9, 2026
Maintainer

henrikjacobsenfys Feb 9, 2026
Maintainer Author

henrikjacobsenfys Feb 9, 2026
Maintainer Author

damskii9992 Feb 9, 2026
Maintainer

rozyczko Feb 9, 2026
Maintainer

damskii9992 Feb 9, 2026
Maintainer

rozyczko
Feb 9, 2026
Maintainer

henrikjacobsenfys Feb 9, 2026
Maintainer Author