Skip to content

Benchmarks

Wickra is a streaming-first library: the state machine inside every indicator takes a single new tick and returns its updated output in constant time. The charts below show what that costs against the full Python TA ecosystem and the other Rust crates — wins and losses, the same figures the project README carries. Each bar is normalised to the slowest in its group, so the shortest bar is the fastest library; the value to the right is the measured number.

Choosing a language? Jump to per-binding throughput

All 10 bindings call the same verified Rust core, but the cost of crossing each language's FFI boundary differs by orders of magnitude on streaming workloads. See Per-binding throughput to pick the binding that keeps up with your hot loop (Rust / C / C++ / C# are near-core; R is the outlier).

Reproduce these on your own hardware

bash
# Python — vs talipp / TA-Lib / tulipy / pandas-ta / finta
pip install -e bindings/python[bench]
python -m benchmarks.compare_libraries

# Rust core — vs kand / ta-rs / yata
cargo bench -p wickra-bench

The Python script auto-detects every peer library installed in your venv. The nightly cross-library-bench workflow runs both suites on a Linux runner and uploads the raw reports as artefacts.

1. Streaming — the structural win

Live trading feeds one tick at a time. Wickra updates every indicator in O(1); batch-only libraries (TA-Lib, tulipy, finta, pandas-ta) have no incremental API and must recompute the whole history on every tick. Only talipp (Python) and ta-rs / yata (Rust) carry real per-tick state. This is the gap the library was built to expose.

Python — per-tick latency (seed 5 000 bars, then feed ticks one at a time):

SMA(20)
talipp
0.959µs / tick
Wickra
0.089µs / tick
EMA(20)
talipp
1.19µs / tick
Wickra
0.111µs / tick
RSI(14)
talipp
0.949µs / tick
Wickra
0.061µs / tick
MACD(12,26,9)
talipp
3.30µs / tick
Wickra
0.079µs / tick
Bollinger(20,2.0)
talipp
4.97µs / tick
Wickra
0.089µs / tick

Against the only other incremental Python peer Wickra is 11–56× faster; against the recompute-on-every-tick libraries it is 2 800–19 000× faster (finta RSI hits 19 000×). tulipy / pandas-ta land in the same recompute band as TA-Lib — too far off-scale to chart next to a sub-microsecond bar.

Rust — per-tick latency (whole 50 000-bar series, lower = faster):

SMA(20)up to 1.1× faster
Wickra
50ns
ta-rs
47ns
kand
38ns
yata
38ns
EMA(20)up to 2.2× faster
Wickra
154ns
kand
69ns
yata
69ns
ta-rs
56ns
RSI(14)
kand
216ns
Wickra
164ns
ta-rs
74ns
yata
ns
MACD(12,26,9)up to 1.9× faster
Wickra
275ns
kand
143ns
ta-rs
66ns
yata
ns
Bollinger(20,2.0)
kand
248ns
ta-rs
168ns
Wickra
128ns
yata
ns
ATR(14)
kand
166ns
Wickra
152ns
ta-rs
61ns
yata
ns

ta-rs hands back a bare f64 from the first tick with no warmup and no validation; it leads several rows by giving those guarantees up. Against kand, Wickra wins streaming RSI, Bollinger and ATR. yata exposes only SMA/EMA as raw-value methods, so its other rows are omitted rather than faked.

2. Batch — competitive, not the headline

Whole series in one call. This is not the headline: hand-tuned C (tulipy, TA-Lib) and the leanest crate (kand) win the simple recurrences, and we show the full field rather than cherry-pick. Wickra trades a few µs per pass for the None-warmup, NaN-safety and bit-exact batch == streaming guarantees none of them keep — yet it still beats pandas-ta and finta on every row, and TA-Lib on RSI and ATR.

Python (20 000-bar pass, µs/op, lower = faster):

SMA(20)
finta
290.1µs
pandas-ta
32.7µs
Wickra
22.2µs
tulipy
15.9µs
TA-Lib
15.6µs
EMA(20)
finta
198.5µs
pandas-ta
46.7µs
tulipy
30.9µs
Wickra
30.5µs
TA-Lib
30.4µs
RSI(14)
finta
812.3µs
pandas-ta
88.8µs
TA-Lib
72.0µs
Wickra
52.3µs
tulipy
34.2µs
MACD(12,26,9)
finta
716.7µs
pandas-ta
286.8µs
Wickra
129.8µs
TA-Lib
111.1µs
tulipy
38.4µs
Bollinger(20,2.0)
finta
1255.5µs
pandas-ta
474.3µs
Wickra
87.2µs
TA-Lib
74.6µs
tulipy
37.9µs
ATR(14)
finta
3496.4µs
TA-Lib
87.3µs
Wickra
74.7µs
tulipy
35.5µs
pandas-ta
µs

All five libraries are measured in the same Python 3.12 run as Wickra (no CI-vs-desktop mix). tulipy's SIMD C and TA-Lib lead the simple recurrences; pandas-ta and finta trail across the board. talipp is excluded from the batch chart on purpose — it is streaming-first, so re-instantiating it for a full batch pass is not a like-for-like comparison.

Rust (50 000-bar pass, µs, lower = faster). Only Wickra and kand expose a batch API; ta-rs and yata are streaming-only:

SMA(20)up to 1.3× faster
Wickra
53µs
kand
41µs
EMA(20)up to 1.6× faster
Wickra
111µs
kand
71µs
RSI(14)
kand
259µs
Wickra
221µs
MACD(12,26,9)up to 1.6× faster
Wickra
533µs
kand
327µs
Bollinger(20,2.0)
kand
460µs
Wickra
404µs
ATR(14)
kand
169µs
Wickra
122µs

Wickra wins RSI, Bollinger and ATR outright and trades a few µs on the simple recurrences for the warmup/NaN guarantees. Its real edge is breadth (514 indicators) and O(1) streaming across ten languages, not winning every micro-benchmark — the project README carries the same tables.

3 — Per-binding throughput

The sections above compare Wickra against other libraries — which only exist for Python and Rust. Every binding calls the same Rust core, so this last table is not a speed claim: it measures the raw cost of crossing each language's FFI boundary, in million updates per second (Mupd/s), for SMA(20) over 200 000 bars (median of 3, same machine as above).

Targetstreaming (Mupd/s)batch (Mupd/s)
Rust core (no FFI)380498
C / C++365358
C#348259
Python3146
Java38173
Go23394
WASM21169
Node.js169
R0.1279

Streaming spans three orders of magnitude — the raw C ABI (365) sits just under the FFI-free Rust ceiling (380), while R's per-call interpreter overhead makes streaming ~2800× slower than its own batch. The single batch crossing stays high for the bindings that return a contiguous buffer; the low outliers are Node (its napi batch boxes every element into a JS Array) and Python (a stdlib array.array copy, now that NumPy is optional). Reproduce with the per-binding throughput scripts — see BENCHMARKS.md §3.

What the numbers do not say

  • Absolute µs values depend on CPU, memory clock, OS scheduler, and the Python / Node.js / Rust versions — read them as relative speedups between libraries on identical input, not as a universal performance contract.
  • Reproduced on: Windows 11 Pro 26200, AMD Ryzen 9 9950X, 64 GB DDR5, Rust 1.92 (release profile, lto = "fat", codegen-units = 1), Python 3.12.
  • The Python Wickra figures are the Python binding runtime, not the bare Rust kernel — a small PyO3 boundary cost is included on each measurement.

See also

  • benchmarks/compare_libraries.py — the canonical Python script.
  • crates/wickra-bench — the Rust cross-library benchmark harness.
  • Bench workflow — nightly run on the GitHub-hosted Linux runner, archived as build artefacts.
  • BENCHMARKS.md §3 — per-binding throughput benchmarks: raw updates/sec for each language binding (C, C++, C#, Go, Java, Python, R, WASM, plus the Rust core baseline). These measure each binding's FFI overhead, not the cross-library comparison shown above.
  • Streaming-vs-Batch (docs) — what the equivalence guarantee actually means.

Updated: