Skip to content

Conversation

@connie
Copy link
Member

@connie connie commented Jun 10, 2014

Some reformulating and updating of Nathan's database tests (also moving the commits to the official/universaldatabase branch so that everybody can contribute.)

@connie connie mentioned this pull request Jun 10, 2014
@connie
Copy link
Member Author

connie commented Jun 12, 2014

Last commit is failing due to universalDatabase branch not being merged on RMG-database

nyee and others added 20 commits June 13, 2014 10:41
Add bit of code to identify products better. I check the name of groups I
think are products vs the products listed in forwardTemplate.products.

One error we check for is if a label in rules.py is misnamed, i.e. if any
part of the label is not a group in group.py. There a decent amount of
these errors due to the recent Java/Py database merge. To make my life
database debugging life easier, I added a new feature which tells me if
there is a group which matches the misnamed label.

For example, let's say we have a rule label, A;B. Somebody changed the
name of A to C in groups.py, but left it unchanged in rules.py. The new
feature would suggest C as the correct name for A, by comparing adj lists
of the rule and those in groups.py

Unfortunately, there is a bug I can't figure out. If somebody changed A to
C
AND B to D, it suggest C for A and C for B. The manual work around
is to visually verify which one adjList C matches, make the fix, and run
checkWellFormed again. It will then suggest D for B. I feel like this
isn't worth fixing because it will be unable to suggest groups with the
upgrade to universal database (once all adjlists are removed from
rules.py)
The identifying of node names and labels based on the actual nodes 
is different.
First, we check if it's right, i.e. if the groups named in 
the label 'group1;group2;group3' are indeed the same groups in 
the groups.py file.

Then, if not, we try to figure out what it should be called.
For each group in the actual node, we look through the groups.py
file to see if we can find one, and if so suggest that name.

I think this logic flow works better than what was there before,
but I did change/break the way it reports errors. Previously it would
collect them in a list of tuples, then return a tuple of lists, and
whatever function had called it (eg. in EvaluateKinetics.py in the RMG-database
project) would have to deduce what this meant and report the errors.
That means keeping the two files, in different projects, in sync
as the tuple/list syntax changes.  Rather than figure this out, I 
just report the errors as they are detected. If you prefer the other,
mute my logging lines, and compile a tuple as before.

Hope this helps.
1) Less code duplication by grouping the if statements 'if not (a and b):'
2) Report what the group definition should be if the label were correct
   (this could be helpful if the label was right but the groups out of date)
Previously, I had to develop back and forth between the RMG-Py and
RMG-database every time I wanted to make a change to checkWellFormed. Now
I have moved it so it is entirely in RMG-Py.

The format which I report errors has also changed. At the moment, some errors
are returned as tuples, parsed and recorded into a file called
DatabaseWellFormedSummary.txt. Some of the other errors are reported using the
logging library. I plan on unifying how this is done in subsequent
commits.
Now all the errors that are searched for print to a database.log instead
of the hardcoded output file I had previously.
When the database is formed, it also creates groups for the products,
which would not be found in groups.py. These were usually labeled as
'product' + an integer. I ran into comparison issues with names with
parenthesis. I have revised the regex I used with re.escape, so that I
don't get these false positives anymore.
Product groups not found in the reaction family are automatically generated
by RMG for finding reverse reactions.  These groups don't have tracking
of parents and children like the rest of the groups.

Introduce this new functionality to track the children and parents
of the introduced groups.  i.e.

LogicOR Y
  - Group Y1
  - Group Y2
  - Group Y3

This way database testing can be done methodically.
Currently can only check Groups against Groups and LogicOrs against LogicOrs
and assumes that everything else is false (which might not be the case, but it's
much harder checking that.)
This should be reverted on merging to master.
rwest added a commit that referenced this pull request Jun 13, 2014
Universal database tests all seem to be working. Still need the databaseTester.py script to be removed, once all its features have been turned into unit tests.
@rwest rwest merged commit a312060 into master Jun 13, 2014
@connie connie deleted the universalDatabase branch December 2, 2014 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants