Are you using evaluation benchmarks during training?

Hi — this is very interesting work.

I had a quick question regarding the training data. In Appendix A and Table 10, it is mentioned that TVBench, STI-Bench, and MMR-VBench are used during training. However, these benchmarks are released **strictly for validation and benchmarking** purposes. 

Could you please clarify how they are being used in training?

<img width="1295" height="746" alt="Image" src="https://github.com/user-attachments/assets/cbc26133-a236-46af-85d8-d759f9036ae4" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are you using evaluation benchmarks during training? #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Are you using evaluation benchmarks during training? #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions