Building a Zero-Dependency secp256k1 CUDA Engine from Scratch (2.5B ops/SEC)

(github.com)

1 points | by shrecshrec 6 hours ago ago

3 comments

2 hours ago
[deleted]
shrecshrec 2 hours ago
I will be glad to hire any suggestions from everyone abut future improvements and ideas.
shrecshrec 6 hours ago
I implemented a full secp256k1 engine from scratch in C++ and CUDA with zero external dependencies (no GMP, no OpenSSL).
The goal was to explore performance limits of:
Jacobian mixed-add
Batch inversion using Montgomery’s trick
Large-scale scalar stepping
GPU memory coalescing strategies
On RTX 5060 I’m getting ~2.5B mixed-add operations/sec.
Key design decisions:
Little-endian limb layout for hardware efficiency
Big-endian only for visualization
Deterministic memory layout
No dynamic allocation in hot paths
Would love feedback from people working on ECC or GPU math.