ADR suggestion allow users to query objects by name which is not globally unique
#194
Replies: 5 comments 10 replies
-
|
Please remember that the notebooks/library are only meant for advanced users, i.e. users which should know not to re-run certain cells. Beginner users are meant to use the app. Another important point is that they will encounter this exact issue when reducing their data too, as not all cells can be rerun in the reduction workflows either. At the imaging beamline VENUS at SNS, their instrument data scientist encountered exactly the problems you mentioned when he made jupyter notebooks for his users. The notebooks would be messed up by dumb users running notebooks in arbitrary order. He has had great success using these "can only be run once" cells, and providing a clear ruleset and guidelines for using Jupyter notebooks at the start of his every notebook, including color-coding his cells. The thing is, users will mess up your notebooks, even if we don't use |
Beta Was this translation helpful? Give feedback.
-
|
I understand your concerns, Henrik, about how difficult these kinds of examples can be for users who are not software developers. And I agree that in many such cases users will run into frustrating problems during their normal workflow. I also agree with Christian’s point about real user experience at SNS: users will find ways to mess up notebooks anyway. The best UX for many users would be to use the GUI. But I’m afraid we will never be able to make the GUI as flexible as the library in terms of functionality. So for experienced users, the library will still be needed. And by experienced users, I mean experienced in science, not in programming. These users can have complex scientific tasks that can only realistically be done through the library. Because of this, I believe we should try to make notebook workflows as simple and forgiving as possible. I don’t know if the approach used at VENUS at SNS works well for their users in practice. But if it does, then I think we should take a closer look at it and try to adapt it where possible. Now, regarding your original example. 1. About unique_nameMaybe this is not the best example for demonstrating the need for Why not do the same for the components Right now, this makes the code more complex:
These two names are not even the same, which doubles complexity. One could make them identical, but then what is the point of duplicating them at all? If we remove explicit unique names, the code becomes simpler and more natural: model = ComponentCollection()
gaussian = Gaussian()
gaussian2 = Gaussian()
model.append_component(gaussian)
model.append_component(gaussian2)
gaussian.area.fixed = True
gaussian2.center = 0.2
analysis = Analysis(data=data, model=model)
analysis.fit()
analysis.plot_data_and_fit()In this case, unique names can be auto-generated internally, and there are no conflicts. 2. Names should be scoped, not globally uniqueIf users really need to access components by name, then we should have a local name attribute, not a globally unique one. This is exactly the approach I currently use in the new EasyDiffraction library. I do not plan to allow users to assign global unique names, because this easily leads to the kinds of problems you described - and others as well. A name should be unique only within a certain scope, not everywhere. For example, we should be able to do this: first_model['my_gaussian'].area.fixed = True
second_model['my_gaussian'].area.fixed = FalseHere, both In diffraction, this is especially important. For example, when working with multiple sample models: first_model.atom_sites['Si'].fract_x = 0.5
second_model.atom_sites['Si'].fract_x = 0.0I am not going to ask users to do something like: first_model.atom_sites['first_unique_Si'].fract_x = 0.5
second_model.atom_sites['second_unique_Si'].fract_x = 0.0That would be very unnatural from a scientific point of view. 3. Moving toward a higher-level APIThe more I work on EasyDiffraction and collect user feedback - including what Christian mentioned about SNS notebooks and what Henrik expects users will struggle with - the more I believe in an API where most things can be done via top-level objects, without manually creating and managing many small objects. This is quite different from traditional Python library design and standard developer practices, but if our users cannot use the library effectively, then all this work loses much of its value. Expecting to train all users to adopt good programming practices also feels unrealistic. Applying my current EasyDiffraction approach to your example, the workflow could look like this: model = ComponentCollection()
model.add_component(name='my_gaussian', type='gaussian')
model.add_component(name='my_gaussian2', type='gaussian')
model['my_gaussian'].area.fixed = True
model['my_gaussian2'].center = 0.2Here, many details are hidden from the user:
All name management happens inside the collection, not via external objects. This avoids conflicts, keeps the API clean, and prevents users from accidentally breaking internal state when rerunning cells. In my view, this is much closer to how scientists expect such tools to behave. Ideally, we would need to have two API styles: one following a more traditional library design for users with strong programming experience, and another one designed mainly for scientists. This is what I started to implement in EasyDiffraction. However, supporting two parallel APIs requires much more work and long-term maintenance. Because of that, I am seriously considering focusing only on the more straightforward, scientist-friendly API. I believe users with minimal programming skills will make up the majority of users at ESS and similar facilities, and for them simplicity and robustness are more important than flexibility or “pure” library design. That said, this is only my view for the final, technique-specific product. For the core library, I think a more traditional library design makes more sense. The corelib will be hidden from end users by our top-level APIs. If someone decides to work with the corelib directly, they are expected to understand what they are doing and to have good enough programming skills. |
Beta Was this translation helpful? Give feedback.
-
|
The current I tend to agree with @AndrewSazonov's take on the local Users should not be forced to think about global namespaces when building models - usage of our Human readable labels should be local to the scope of their creation. Components should always have an internal, stable identity (implementation concern) - our model = Model()
model.add(Gaussian(name="peak"))
model.add(Lorentzian(name="background"))
# lookup is local to the model
model["peak"].sigma = 0.2This is essentially what Andrew is also proposing. Maybe we could think of having truly global references, but only when explicitly requested? You all seem to think this is requiring too much, but if we want to think about linking components between various models (like in creating the dependencies/constraints) or doing serialization (where users would get their predefined names on project read). If we had a way of doing, say g = Gaussian(name="peak")
g.assign_global_id("sample1.elastic_peak")
model1.add(g)
# Later, elsewhere:
registry["sample1.elastic_peak"].sigma = 0.15with Adding the This means we have
|
Beta Was this translation helpful? Give feedback.
-
I've looked at this again, and can I just ask, why even have a custom > model=ComponentCollection()
> gaussian=Gaussian()
> lorentzian=Lorentzian()
> model.append_component(gaussian)
> model.append_component(lorentzian)
>
> gaussian.area.fixed=True
> lorentzian.center=0.2
> analysis=Analysis(data=data,model=model)
> analysis.fit()
> analysis.plot_data_and_fit()And when the user then edits this cell, there will be no
If you are making the model in the same notebook, the components are already in the namespace, so there is no reason to access them through the Setting custom |
Beta Was this translation helpful? Give feedback.
-
|
After discussing this with @damskii9992 we seem to agree that this should NOT be a base class implementation. Some techniques (spectroscopy, diffraction) need the extra name-like attribute on Descriptor ( It therefore makes sense to allow derived Parameter classes in tech-specific libraries to define their own name-like attributes. This also leads to allowing the If, at any time, it becomes obvious we all need this flexibility, we should reconsider placing those extra attributes (and associated logic) in the base class. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Just to have a place for this discussion. I can think of many examples, but here's just one where indexing by
unique_namesis annoying, and where I don't see an easy workaround.User defines a model and tries to fit it, in somewhat pseudocode:
The user then sees the plot and realises that it actually looks like it would fit better with a second Gaussian instead of a Lorentzian. So they edit their cell:
And the user gets an error. This type of behavior is extremely common in analysis workflows, as in, I expect more than 90% of my users to do something like this.
They will either have to restart the notebook or change the unique_name and all references to it, which is not Easy^TM.
They could also create a new cell with something like this, and then run the analysis afterwards
This is terrible for the next day when they do in fact restart their notebook and now add and remove components at random. Their code would look like this:
This encourages really messy notebooks, which I do not want to do.
Note that the above step, where they realise that their model isn't quite right, is likely to be taken many times. It's not a simple one-off to define the right model that fits your data, unless your data is particularly simple.
In general, it's common to have "setting up my model" in a single cell. If you're never allowed to rerun a cell that defines your model, you need to separate different parts of the "setting up my model" artifically into smaller steps, and remember to re-run the exact ones that you change. This may be good coding practice, but it is not how neutron scatterers work.
I think it's extremely ambitious and non-Easy^TM to set out to re-educate our userbase on good coding practice. For half of the spectroscopy users, switching from Matlab to Python is already a Big Deal, and if they get random errors when running their code, they will not make the switch. Or they will do it grudgingly like with certain other well known neutron scattering software packages, and complain a lot.
Beta Was this translation helpful? Give feedback.
All reactions