Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
## Highlights

- **Pure PHP** — No FFI, no external binaries, no compiled extensions. Works everywhere PHP runs.
- **Zero Hard Dependencies** — Core tokenization has no required dependencies. Optional HTTP client needed only for Hub downloads.
- **Hub Compatible** — Load tokenizers directly from Hugging Face Hub or from local files.
- **Hub Integration** — Load tokenizers from Hugging Face Hub with smart caching and manifest-based file checks.
- **Flexible Loading** — Load from local files, config arrays, or build custom tokenizers with the builder API.
- **Fully Tested** — Validated against BERT, GPT-2, Llama, Gemma, Qwen, RoBERTa, ALBERT, and more.
- **Modern PHP** — Built for PHP 8.2+ with strict types, readonly classes, and clean interfaces.

Expand All @@ -28,16 +28,14 @@ Install via Composer:
composer require codewithkyrian/tokenizers
```

### HTTP Client (Optional)
### HTTP Client (for Hub loading)

If you plan to load tokenizers from the Hugging Face Hub, you'll need an HTTP client implementing PSR-18. We recommend Guzzle:
Loading tokenizers from the Hugging Face Hub requires an HTTP client. We recommend Guzzle:

```bash
composer require guzzlehttp/guzzle
```

> **Note:** The library uses [PHP-HTTP Discovery](https://github.com/php-http/discovery) to automatically find and use any PSR-18 compatible HTTP client installed in your project. If you're only loading tokenizers from local files, no HTTP client is needed.

## Quick Start

```php
Expand Down Expand Up @@ -96,10 +94,13 @@ $tokenizer = Tokenizer::fromHub(

When `cacheDir` is not specified, the library automatically resolves the cache location:

1. **Environment Variable** — `TOKENIZERS_CACHE` if set
2. **macOS** — `~/Library/Caches/huggingface/tokenizers`
3. **Linux** — `$XDG_CACHE_HOME/huggingface/tokenizers` or `~/.cache/huggingface/tokenizers`
4. **Windows** — `%LOCALAPPDATA%\huggingface\tokenizers`
1. **HF_HUB_CACHE** — if set, used directly
2. **HF_HOME** — if set, `$HF_HOME/hub`
3. **macOS** — `~/Library/Caches/huggingface/hub`
4. **Linux** — `$XDG_CACHE_HOME/huggingface/hub` or `~/.cache/huggingface/hub`
5. **Windows** — `%LOCALAPPDATA%\huggingface\hub`

Pass `cacheDir` to use a custom directory.

### From Local Files

Expand Down
6 changes: 2 additions & 4 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,7 @@
},
"require": {
"php": "^8.2",
"psr/http-client": "^1.0",
"psr/http-factory": "^1.0",
"php-http/discovery": "^1.19"
"codewithkyrian/huggingface": "^1.0"
},
"require-dev": {
"friendsofphp/php-cs-fixer": "^3.91",
Expand All @@ -56,4 +54,4 @@
"cs:check": "vendor/bin/php-cs-fixer fix --dry-run --diff",
"analyse": "vendor/bin/phpstan analyse -c phpstan.dist.neon"
}
}
}
229 changes: 0 additions & 229 deletions examples/context_window_fit_analysis.php

This file was deleted.

Loading