Skip to content

everyevalever launch#13

Open
janbatzner wants to merge 2 commits intoevaleval:masterfrom
janbatzner:master
Open

everyevalever launch#13
janbatzner wants to merge 2 commits intoevaleval:masterfrom
janbatzner:master

Conversation

@janbatzner
Copy link

No description provided.

@evijit
Copy link
Contributor

evijit commented Feb 15, 2026

@copilot Can you do a grammar pass on the blog?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new blog post announcing the launch of "Every Eval Ever," a standardized schema and repository for AI evaluation results, along with a template modification to support excluding posts from the homepage collection display.

Changes:

  • New blog post detailing the Every Eval Ever initiative, its schema, validation tools, and converters
  • Update to blogs.html template to filter posts with exclude_from_collection: true from homepage listings

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
_posts/2026-02-15-everyevalever-launch.md New comprehensive blog post announcing Every Eval Ever launch with technical details, examples, and call for participation
_includes/blogs.html Adds liquid template filtering to exclude posts marked with exclude_from_collection: true from homepage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

It is time for a change. We have seen this before in other parts of the ML pipeline. The community stopped retraining models from scratch or rewriting training code for each project long ago. Evaluations are next.

## Why Us, Why Now
We just know the pain. The EvalEval Coalition is a community of researchers working to fix how AI evaluations are built, run, documented, shared, and compared. We worked on a myriad of projects where collecting evaluations restricts what can be done or takes most of the project’s efforts. Need examples? See [1](https://arxiv.org/abs/2602.03344), [2](https://arxiv.org/abs/2503.01622), [3](https://proceedings.neurips.cc/paper_files/paper/2024/hash/28236482f64a72eec43706b6f3a6c511-Abstract-Conference.html), [4](https://arxiv.org/abs/2412.06540), [5](https://arxiv.org/abs/2410.11840), [6](https://aclanthology.org/2024.acl-long.456/), [7](https://arxiv.org/abs/2407.13696), [8](https://par.nsf.gov/servlets/purl/10547932), [9](https://aclanthology.org/2024.naacl-long.139/), [10](https://aclanthology.org/2025.acl-long.34.com) among others.
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL has an incorrect ".com" suffix appended. ACL Anthology URLs should not have ".com" at the end. It should be "https://aclanthology.org/2025.acl-long.34/" instead of "https://aclanthology.org/2025.acl-long.34.com".

Suggested change
We just know the pain. The EvalEval Coalition is a community of researchers working to fix how AI evaluations are built, run, documented, shared, and compared. We worked on a myriad of projects where collecting evaluations restricts what can be done or takes most of the project’s efforts. Need examples? See [1](https://arxiv.org/abs/2602.03344), [2](https://arxiv.org/abs/2503.01622), [3](https://proceedings.neurips.cc/paper_files/paper/2024/hash/28236482f64a72eec43706b6f3a6c511-Abstract-Conference.html), [4](https://arxiv.org/abs/2412.06540), [5](https://arxiv.org/abs/2410.11840), [6](https://aclanthology.org/2024.acl-long.456/), [7](https://arxiv.org/abs/2407.13696), [8](https://par.nsf.gov/servlets/purl/10547932), [9](https://aclanthology.org/2024.naacl-long.139/), [10](https://aclanthology.org/2025.acl-long.34.com) among others.
We just know the pain. The EvalEval Coalition is a community of researchers working to fix how AI evaluations are built, run, documented, shared, and compared. We worked on a myriad of projects where collecting evaluations restricts what can be done or takes most of the project’s efforts. Need examples? See [1](https://arxiv.org/abs/2602.03344), [2](https://arxiv.org/abs/2503.01622), [3](https://proceedings.neurips.cc/paper_files/paper/2024/hash/28236482f64a72eec43706b6f3a6c511-Abstract-Conference.html), [4](https://arxiv.org/abs/2412.06540), [5](https://arxiv.org/abs/2410.11840), [6](https://aclanthology.org/2024.acl-long.456/), [7](https://arxiv.org/abs/2407.13696), [8](https://par.nsf.gov/servlets/purl/10547932), [9](https://aclanthology.org/2024.naacl-long.139/), [10](https://aclanthology.org/2025.acl-long.34/) among others.

Copilot uses AI. Check for mistakes.
}
```

### Feedback and Advise
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section heading uses "Advise" which is a verb, but the correct noun form should be "Advice". This should be "Feedback and Advice".

Suggested change
### Feedback and Advise
### Feedback and Advice

Copilot uses AI. Check for mistakes.
author = {Jan Batzner and Leshem Coshen and Avijit Ghosh and Sree Harsha Nelaturu and Anastassia Kornilova and Damian Stachura and Anka Reuel and Yifan Mai and Asaf Yehudai and Irene Solaiman and Stella Biderman},
year = {2026},
month = {February},
url = {https://evaleval.github.io/2026/02/16/everyevalever-launch/},
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BibTeX citation URL shows the date as "2026/02/16" but the frontmatter date is "2026-02-15". This creates an inconsistency between the post date and its canonical URL in the citation.

Copilot uses AI. Check for mistakes.
published: true
exclude_from_collection: true
category: Infrastructure
image: "/assets/img/long-site-banner.webp"
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image path uses a leading slash "/assets/img/long-site-banner.webp" while other blog posts in the codebase use "assets/img/..." without the leading slash. This inconsistency may cause issues depending on how Jekyll processes the paths. Consider using "assets/img/long-site-banner.webp" to match the convention seen in other posts like "2025-07-13-eval-science-kickoff.md" and "2025-08-10-blog-chart-crisis.md".

Suggested change
image: "/assets/img/long-site-banner.webp"
image: "assets/img/long-site-banner.webp"

Copilot uses AI. Check for mistakes.
Comment on lines 160 to 167
We need your help. We're launching a [Shared Task](evalevalai.com/events/) for practitioners alongside this post — two tracks for contributing public and proprietary eval data to the repository, with co-authorship for qualifying contributors and a [workshop at ACL 2026 in San Diego](https://evalevalai.com/events/2026-acl-workshop/).

*Submissions open now, deadline May 1, 2026.*

## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)

- Join the Shared Task 🏁 : [Call for Participation](evalevalai.com/events/)
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URLs in this line are missing the https:// protocol prefix. The links to "evalevalai.com/events/" should be "https://evalevalai.com/events/" to function properly.

Suggested change
We need your help. We're launching a [Shared Task](evalevalai.com/events/) for practitioners alongside this post — two tracks for contributing public and proprietary eval data to the repository, with co-authorship for qualifying contributors and a [workshop at ACL 2026 in San Diego](https://evalevalai.com/events/2026-acl-workshop/).
*Submissions open now, deadline May 1, 2026.*
## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)
- Join the Shared Task 🏁 : [Call for Participation](evalevalai.com/events/)
We need your help. We're launching a [Shared Task](https://evalevalai.com/events/) for practitioners alongside this post — two tracks for contributing public and proprietary eval data to the repository, with co-authorship for qualifying contributors and a [workshop at ACL 2026 in San Diego](https://evalevalai.com/events/2026-acl-workshop/).
*Submissions open now, deadline May 1, 2026.*
## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)
- Join the Shared Task 🏁 : [Call for Participation](https://evalevalai.com/events/)

Copilot uses AI. Check for mistakes.
*Submissions open now, deadline May 1, 2026.*

## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub URL is missing the https:// protocol prefix. It should be "https://github.com/evaleval/every_eval_ever" to work as a clickable link.

Suggested change
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](https://github.com/evaleval/every_eval_ever)

Copilot uses AI. Check for mistakes.
Comment on lines 160 to 167
We need your help. We're launching a [Shared Task](evalevalai.com/events/) for practitioners alongside this post — two tracks for contributing public and proprietary eval data to the repository, with co-authorship for qualifying contributors and a [workshop at ACL 2026 in San Diego](https://evalevalai.com/events/2026-acl-workshop/).

*Submissions open now, deadline May 1, 2026.*

## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)

- Join the Shared Task 🏁 : [Call for Participation](evalevalai.com/events/)
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL is missing the https:// protocol prefix. It should be "https://evalevalai.com/events/" to work as a clickable link.

Suggested change
We need your help. We're launching a [Shared Task](evalevalai.com/events/) for practitioners alongside this post — two tracks for contributing public and proprietary eval data to the repository, with co-authorship for qualifying contributors and a [workshop at ACL 2026 in San Diego](https://evalevalai.com/events/2026-acl-workshop/).
*Submissions open now, deadline May 1, 2026.*
## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)
- Join the Shared Task 🏁 : [Call for Participation](evalevalai.com/events/)
We need your help. We're launching a [Shared Task](https://evalevalai.com/events/) for practitioners alongside this post — two tracks for contributing public and proprietary eval data to the repository, with co-authorship for qualifying contributors and a [workshop at ACL 2026 in San Diego](https://evalevalai.com/events/2026-acl-workshop/).
*Submissions open now, deadline May 1, 2026.*
## Get involved
- Try the schema 📋 : [Hugging Face Space](https://huggingface.co/spaces/evaleval/every_eval_ever_space) and [GitHub](github.com/evaleval/every_eval_ever)
- Join the Shared Task 🏁 : [Call for Participation](https://evalevalai.com/events/)

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

image: "/assets/img/long-site-banner.webp"
authors:
- name: "Jan Batzner*"
- name: "Leshem Coshen*"
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The author name "Leshem Coshen" appears to be spelled inconsistently with other posts in the repository. In the post "_posts/2025-08-10-blog-chart-crisis.md", the same author is listed as "Leshem Choshen" (with an 'h'). Please verify the correct spelling and ensure consistency across all posts.

Suggested change
- name: "Leshem Coshen*"
- name: "Leshem Choshen*"

Copilot uses AI. Check for mistakes.
```bibtex
@misc{evaleval2026everyevalever,
title = {Every Eval Ever: Toward a Common Language for AI Eval Reporting},
author = {Jan Batzner and Leshem Coshen and Avijit Ghosh and Sree Harsha Nelaturu and Anastassia Kornilova and Damian Stachura and Anka Reuel and Yifan Mai and Asaf Yehudai and Irene Solaiman and Stella Biderman},
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The author name "Leshem Coshen" in the BibTeX citation appears to be spelled inconsistently with other posts in the repository. In the post "_posts/2025-08-10-blog-chart-crisis.md", the same author is listed as "Leshem Choshen" (with an 'h'). Please verify the correct spelling and ensure consistency across all posts and citations.

Copilot uses AI. Check for mistakes.

## Why Us, Why Now

We understand the pain firsthand. The EvalEval Coalition is a community of researchers working to fix how AI evaluations are built, run, documented, shared, and compared. We have worked on numerous projects where collecting evaluations either restricts what can be done or consumes most of the project's effort. Need examples? See [1](https://arxiv.org/abs/2602.03344), [2](https://arxiv.org/abs/2503.01622), [3](https://proceedings.neurips.cc/paper_files/paper/2024/hash/28236482f64a72eec43706b6f3a6c511-Abstract-Conference.html), [4](https://arxiv.org/abs/2412.06540), [5](https://arxiv.org/abs/2410.11840), [6](https://aclanthology.org/2024.acl-long.456/), [7](https://arxiv.org/abs/2407.13696), [8](https://par.nsf.gov/servlets/purl/10547932), [9](https://aclanthology.org/2024.naacl-long.139/), [10](https://aclanthology.org/2025.acl-long.34.com), among others.
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACL Anthology URL for reference [10] has an incorrect format. ACL Anthology URLs should end with a forward slash, not ".com". The URL "https://aclanthology.org/2025.acl-long.34.com" should be "https://aclanthology.org/2025.acl-long.34/" to be a valid link.

Suggested change
We understand the pain firsthand. The EvalEval Coalition is a community of researchers working to fix how AI evaluations are built, run, documented, shared, and compared. We have worked on numerous projects where collecting evaluations either restricts what can be done or consumes most of the project's effort. Need examples? See [1](https://arxiv.org/abs/2602.03344), [2](https://arxiv.org/abs/2503.01622), [3](https://proceedings.neurips.cc/paper_files/paper/2024/hash/28236482f64a72eec43706b6f3a6c511-Abstract-Conference.html), [4](https://arxiv.org/abs/2412.06540), [5](https://arxiv.org/abs/2410.11840), [6](https://aclanthology.org/2024.acl-long.456/), [7](https://arxiv.org/abs/2407.13696), [8](https://par.nsf.gov/servlets/purl/10547932), [9](https://aclanthology.org/2024.naacl-long.139/), [10](https://aclanthology.org/2025.acl-long.34.com), among others.
We understand the pain firsthand. The EvalEval Coalition is a community of researchers working to fix how AI evaluations are built, run, documented, shared, and compared. We have worked on numerous projects where collecting evaluations either restricts what can be done or consumes most of the project's effort. Need examples? See [1](https://arxiv.org/abs/2602.03344), [2](https://arxiv.org/abs/2503.01622), [3](https://proceedings.neurips.cc/paper_files/paper/2024/hash/28236482f64a72eec43706b6f3a6c511-Abstract-Conference.html), [4](https://arxiv.org/abs/2412.06540), [5](https://arxiv.org/abs/2410.11840), [6](https://aclanthology.org/2024.acl-long.456/), [7](https://arxiv.org/abs/2407.13696), [8](https://par.nsf.gov/servlets/purl/10547932), [9](https://aclanthology.org/2024.naacl-long.139/), [10](https://aclanthology.org/2025.acl-long.34/), among others.

Copilot uses AI. Check for mistakes.
author = {Jan Batzner and Leshem Coshen and Avijit Ghosh and Sree Harsha Nelaturu and Anastassia Kornilova and Damian Stachura and Anka Reuel and Yifan Mai and Asaf Yehudai and Irene Solaiman and Stella Biderman},
year = {2026},
month = {February},
url = {https://evaleval.github.io/2026/02/16/everyevalever-launch/},
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL in the BibTeX citation has a date mismatch. The file is dated 2026-02-15 (February 15) in both the filename and the frontmatter date field, but the URL uses "2026/02/16" (February 16). This should be corrected to "2026/02/15" to match the post date.

Suggested change
url = {https://evaleval.github.io/2026/02/16/everyevalever-launch/},
url = {https://evaleval.github.io/2026/02/15/everyevalever-launch/},

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants