High-performance synthetic data generation library for testing and development.
superstore is a Rust-powered Python library for generating realistic synthetic datasets. It provides:
| Generator | Description | Use Cases |
|---|---|---|
| Retail | Sales transactions, employees | BI dashboards, forecasting |
| Time Series | Financial-style series with regimes, jumps | Quant research, backtesting |
| Weather | Sensor data with seasonal/diurnal patterns | IoT analytics, anomaly detection |
| Logs | Web server & application logs | Observability, alerting |
| Finance | Stock prices, OHLCV, options chains | Trading systems, risk analysis |
| Telemetry | Machine metrics, anomalies, failures | DevOps dashboards, ML training |
| Tool | Description | Use Cases |
|---|---|---|
| Distributions | Sample from statistical distributions | Simulation, Monte Carlo |
| Copulas | Correlated multivariate data | Risk modeling, portfolio analysis |
| Temporal Models | AR, Markov chains, random walks | Time series simulation |
- Rust-powered: High-performance generation, 10-100x faster than pure Python
- Flexible output: pandas DataFrame, polars DataFrame, or Python dicts
- Configurable: Pydantic config classes for validated, structured configuration
- Reproducible: Seed support for deterministic generation
- Scalable: Streaming and parallel generation for large datasets
pip install superstoreFor development with polars support:
pip install superstore[develop]from superstore import superstore, employees, timeseries, weather
# Generate 1000 retail records as a pandas DataFrame
df = superstore(count=1000)
# Generate as polars DataFrame
df_polars = superstore(count=1000, output="polars")
# Generate as list of dicts
records = superstore(count=1000, output="dict")All data generators support an optional seed parameter for reproducible random data generation:
from superstore import superstore, employees, getTimeSeries, machines
# Same seed produces identical data
df1 = superstore(count=100, seed=42)
df2 = superstore(count=100, seed=42)
assert df1.equals(df2) # True
# Works with all generators
employees_df = employees(count=50, seed=123)
timeseries_df = timeseries(nper=30, seed=456)
weather_df = weather(count=100, seed=789)
machine_list = machines(count=10, seed=321)
# No seed means random data each time
df3 = superstore(count=100) # Different each call# Clone the repository
git clone https://github.com/1kbgz/superstore.git
cd superstore
# Install development dependencies
make develop# Build Python wheel
make build# Run all tests
make test# Run linters
make lint
# Fix formatting
make fixsuperstore uses a hybrid Rust/Python architecture:
- rust/: Core Rust library with all data generation logic
- src/: PyO3 bindings exposing Rust functions to Python
- superstore/: Python package with native module
The core data generation is implemented in Rust for performance, with PyO3 providing seamless Python integration. Output format conversion (pandas/polars/dict) happens in the Rust bindings layer.
This library is released under the Apache 2.0 license
Note
This library was generated using copier from the Base Python Project Template repository.