Benchmarks

Wickra is a streaming-first library: the state machine inside every indicator takes a single new tick and returns its updated output in constant time. The charts below show what that costs against the full Python TA ecosystem and the other Rust crates — wins and losses, the same figures the project README carries. Each bar is normalised to the slowest in its group, so the shortest bar is the fastest library; the value to the right is the measured number.

Choosing a language? Jump to per-binding throughput

All 10 bindings call the same verified Rust core, but the cost of crossing each language's FFI boundary differs by orders of magnitude on streaming workloads. See Per-binding throughput to pick the binding that keeps up with your hot loop (Rust / C / C++ / C# are near-core; R is the outlier).

Reproduce these on your own hardware

bash

# Python — vs talipp / TA-Lib / tulipy / pandas-ta / finta
pip install -e bindings/python[bench]
python -m benchmarks.compare_libraries

# Rust core — vs kand / ta-rs / yata
cargo bench -p wickra-bench

The Python script auto-detects every peer library installed in your venv. The nightly cross-library-bench workflow runs both suites on a Linux runner and uploads the raw reports as artefacts.

1. Streaming — the structural win

Live trading feeds one tick at a time. Wickra updates every indicator in O(1); batch-only libraries (TA-Lib, tulipy, finta, pandas-ta) have no incremental API and must recompute the whole history on every tick. Only talipp (Python) and ta-rs / yata (Rust) carry real per-tick state. This is the gap the library was built to expose.

Python — per-tick latency (seed 5 000 bars, then feed ticks one at a time):

SMA(20)

talipp

0.959µs / tick

★Wickra

0.089µs / tick

EMA(20)

talipp

1.19µs / tick

★Wickra

0.111µs / tick

RSI(14)

talipp

0.949µs / tick

★Wickra

0.061µs / tick

MACD(12,26,9)

talipp

3.30µs / tick

★Wickra

0.079µs / tick

Bollinger(20,2.0)

talipp

4.97µs / tick

★Wickra

0.089µs / tick

Against the only other incremental Python peer Wickra is 11–56× faster; against the recompute-on-every-tick libraries it is 2 800–19 000× faster (finta RSI hits 19 000×). tulipy / pandas-ta land in the same recompute band as TA-Lib — too far off-scale to chart next to a sub-microsecond bar.

Rust — per-tick latency (whole 50 000-bar series, lower = faster):

SMA(20)up to 1.1× faster

★Wickra

50ns

ta-rs

47ns

kand

38ns

yata

38ns

EMA(20)up to 2.2× faster

★Wickra

154ns

kand

69ns

yata

69ns

ta-rs

56ns

RSI(14)

kand

216ns

★Wickra

164ns

ta-rs

74ns

yata

—ns

MACD(12,26,9)up to 1.9× faster

★Wickra

275ns

kand

143ns

ta-rs

66ns

yata

—ns

Bollinger(20,2.0)

kand

248ns

ta-rs

168ns

★Wickra

128ns

yata

—ns

ATR(14)

kand

166ns

★Wickra

152ns

ta-rs

61ns

yata

—ns

ta-rs hands back a bare f64 from the first tick with no warmup and no validation; it leads several rows by giving those guarantees up. Against kand, Wickra wins streaming RSI, Bollinger and ATR. yata exposes only SMA/EMA as raw-value methods, so its other rows are omitted rather than faked.

2. Batch — competitive, not the headline

Whole series in one call. This is not the headline: hand-tuned C (tulipy, TA-Lib) and the leanest crate (kand) win the simple recurrences, and we show the full field rather than cherry-pick. Wickra trades a few µs per pass for the None-warmup, NaN-safety and bit-exact batch == streaming guarantees none of them keep — yet it still beats pandas-ta and finta on every row, and TA-Lib on RSI and ATR.

Python (20 000-bar pass, µs/op, lower = faster):

SMA(20)

finta

290.1µs

pandas-ta

32.7µs

★Wickra

22.2µs

tulipy

15.9µs

TA-Lib

15.6µs

EMA(20)

finta

198.5µs

pandas-ta

46.7µs

tulipy

30.9µs

★Wickra

30.5µs

TA-Lib

30.4µs

RSI(14)

finta

812.3µs

pandas-ta

88.8µs

TA-Lib

72.0µs

★Wickra

52.3µs

tulipy

34.2µs

MACD(12,26,9)

finta

716.7µs

pandas-ta

286.8µs

★Wickra

129.8µs

TA-Lib

111.1µs

tulipy

38.4µs

Bollinger(20,2.0)

finta

1255.5µs

pandas-ta

474.3µs

★Wickra

87.2µs

TA-Lib

74.6µs

tulipy

37.9µs

ATR(14)

finta

3496.4µs

TA-Lib

87.3µs

★Wickra

74.7µs

tulipy

35.5µs

pandas-ta

—µs

All five libraries are measured in the same Python 3.12 run as Wickra (no CI-vs-desktop mix). tulipy's SIMD C and TA-Lib lead the simple recurrences; pandas-ta and finta trail across the board. talipp is excluded from the batch chart on purpose — it is streaming-first, so re-instantiating it for a full batch pass is not a like-for-like comparison.

Rust (50 000-bar pass, µs, lower = faster). Only Wickra and kand expose a batch API; ta-rs and yata are streaming-only:

SMA(20)up to 1.3× faster

★Wickra

53µs

kand

41µs

EMA(20)up to 1.6× faster

★Wickra

111µs

kand

71µs

RSI(14)

kand

259µs

★Wickra

221µs

MACD(12,26,9)up to 1.6× faster

★Wickra

533µs

kand

327µs

Bollinger(20,2.0)

kand

460µs

★Wickra

404µs

ATR(14)

kand

169µs

★Wickra

122µs

Wickra wins RSI, Bollinger and ATR outright and trades a few µs on the simple recurrences for the warmup/NaN guarantees. Its real edge is breadth (514 indicators) and O(1) streaming across ten languages, not winning every micro-benchmark — the project README carries the same tables.

3 — Per-binding throughput

The sections above compare Wickra against other libraries — which only exist for Python and Rust. Every binding calls the same Rust core, so this last table is not a speed claim: it measures the raw cost of crossing each language's FFI boundary, in million updates per second (Mupd/s), for SMA(20) over 200 000 bars (median of 3, same machine as above).

Target	streaming (Mupd/s)	batch (Mupd/s)
Rust core (no FFI)	380	498
C / C++	365	358
C#	348	259
Python	31	46
Java	38	173
Go	23	394
WASM	21	169
Node.js	16	9
R	0.1	279

Streaming spans three orders of magnitude — the raw C ABI (365) sits just under the FFI-free Rust ceiling (380), while R's per-call interpreter overhead makes streaming ~2800× slower than its own batch. The single batch crossing stays high for the bindings that return a contiguous buffer; the low outliers are Node (its napi batch boxes every element into a JS Array) and Python (a stdlib array.array copy, now that NumPy is optional). Reproduce with the per-binding throughput scripts — see BENCHMARKS.md §3.

What the numbers do not say

Absolute µs values depend on CPU, memory clock, OS scheduler, and the Python / Node.js / Rust versions — read them as relative speedups between libraries on identical input, not as a universal performance contract.
Reproduced on: Windows 11 Pro 26200, AMD Ryzen 9 9950X, 64 GB DDR5, Rust 1.92 (release profile, lto = "fat", codegen-units = 1), Python 3.12.
The Python Wickra figures are the Python binding runtime, not the bare Rust kernel — a small PyO3 boundary cost is included on each measurement.

Benchmarks ​

1. Streaming — the structural win ​

2. Batch — competitive, not the headline ​

3 — Per-binding throughput ​

What the numbers do not say ​

See also ​

Benchmarks

1. Streaming — the structural win

2. Batch — competitive, not the headline

3 — Per-binding throughput

What the numbers do not say

See also