This repository contains the core code of the paper "Stacked from One: Multi-Scale Self-Injection for Context Window Extension", accepted by ICLR 2026.
To train the model on downsampled redpajama and activation beacon, please refer to the following repositories to prepare the data
-
Pretraining Downsampled RP: https://github.com/princeton-nlp/CEPE
-
SFT LongAlpaca-12K: https://huggingface.co/datasets/Yukang/LongAlpaca-12k BookSum: https://huggingface.co/datasets/kmfoda/booksum
Synthetic Data (highly encouraged): See details in Llama-3-8B-Instruct-262k and Synthetic Data for Multi-Doc QA
To train sharedllm on red-pajama, use the following command to start training (by default we use NVIDIA 8xA800 GPUs with deepspeed)
CUDA_VISIBLE_DEVICES=$CVD deepspeed train.py --model_name_or_path <llama_path> \
--encoder_name_or_path <llama_path> \
--config <path_to_config> \
--model_class sharedllm \
--output_dir output/sharedllm_7b \
--deepspeed <path_to_deepspeed_config>For mixed dataset training, just change train.py to train_beacon.py and the corresponding configuration files.
For evaluation on language modeling, here's an example of testing model on 8K text length and arxiv domain. Here we use one A800 (80G) GPU to run this experiment
python eval_lm.py --config configs/test/test_ab_4x1024_4096 \
--model_name_or_path <path_to_model_ckpt> \
--model_class sharedllm \
--validation_domains arxiv \
--output_dir output/<experiment_name>For evaluation on longbench and Infbench, please refer to their respective repository and insert the model loading code to the original evaluation scripts. Note that the input loader should be modified as original implementation only supports decoder-only architectures (GPT) which differs from ours.
@misc{
anonymous2025stacked,
title={Stacked from One: Multi-Scale Self-Injection for Context Window Extension},
author={Han, Wei and Zhou, Pan and Yan, Shuicheng},
year={2025},
url={https://openreview.net/forum?id=w1Qpbkb7C6}
}If you have any further questions about this work, feel free to contact me via henryhan88888@gmail.com.