Credibility and Reproducibility of Scientific Simulations on the Blockchain

deRSE25 Conference

Feb 26 2025 | Ashwin Kumar Karnad |
Supported by JuRSE Travel Grant.

The problem of reproducibility in science

Have you failed to reproduce an experiment?

[Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452-454 (2016)]

The problems continued...

Analysed and published data not in sync
Missing link between the analyser and the publisher

Current State of the Art

Version-controlled code to track changes.
Dependency tracking to ensure consistency.
Specified hardware configurations.
Use of Docker for one-click simulation setups.

Why is there still non-determinism?

Floating-Point Arithmetic and hardware differences
Concurrency, Parallelism, and Race conditions
External Input and Environmental Factors
Hardware and Low-Level Execution
...

Floating-Point Arithmetic and Numerical Instability
- Non-Associativity/Non-Commutativity:
- Compiler/Hardware Differences: Variations in instruction sets (SSE vs. AVX), or math libraries (e.g., libm vs. Intel MKL).
Concurrency and Parallelism
- Thread Scheduling: OS-dependent thread/process scheduling causing race conditions or inconsistent memory states in multithreaded code.
- Non-Atomic Operations: Torn reads/writes on shared variables without proper synchronization.
- JIT Compilation: Runtime optimizations in Java/.NET/JavaScript causing timing differences.
External Input and Environmental Factors
- Randomness: Unseeded or time-seeded random number generators (RNGs).
- I/O interactions: Input from sensors, networks, files, or user interactions varying between runs.
- Clock Dependency: Logic relying on system time or timers (e.g., sleep() durations).
Hardware and Low-Level Execution
- Speculative Execution: Side effects from mispredicted branches (e.g., cache timing leaks) in CPUs.
- Memory Address Randomization (ASLR): Pointer values differing across runs, affecting hashing or debugging.
- Thermal Throttling: CPU performance fluctuations altering thread timing.

Why is there still non-determinism?

Floating-Point Arithmetic and hardware differences
Concurrency, Parallelism, and Race conditions
External Input and Environmental Factors
Hardware and Low-Level Execution
...

[DALLE3]

Blockchain as a decentralised global computer

Transactions (computations) are state transitions
State is stored in a distributed ledger
Consensus is achieved through validators

Need for determinism in blockchain

Consensus requires all nodes to agree on the state
Non-deterministic computations can lead to forks

How determinism is achieved

No floating point instructions
Single threaded
Controlling External Inputs
Controlling Random numbers
Bytecode Standardization

No Floating-Point Arithmetic

EVM uses 256-bit integers and fixed-point arithmetic.

Eliminating Concurrency and Parallelism

EVM processes transactions sequentially within a block.
All nodes process transactions in the same order.

Controlling External Inputs

Smart contracts cannot access off-chain data unless via oracles.
Contracts can only access data from the current or previous blocks.

Managing Randomness

On-chain "randomness" is deterministic and known to all nodes.
Contracts use Commit-Reveal Schemes or Oracle-Based RNG.

Enforcing Deterministic Execution Environments

EVM is a sandboxed virtual machine, abstracting hardware differences.
Every opcode has a fixed gas cost.
No speculative execution.

Consensus-Driven State Transition

All nodes start from the same genesis state and apply transactions in the same order.
Transactions are replayed identically on every node.

Bytecode Standardization

High-level code is compiled to standardized EVM bytecode.
No JIT or runtime compilation.

How determinism is achieved

No floating point instructions
Single threaded
Controlling External Inputs
Controlling Random numbers
Bytecode Standardization

[DALLE3]

Running simulations "on chain"

Completely Reproducible
Provenance and authorship is proveable

Running simulations "on chain" - workflow

Example simulations

Source code on Github

Limitations of "on chain" simulations

Lack of math function
(exponents, matrices, floating point, randomness)
Costs (Computation and storage)
Speed

Limitations of "on chain" simulations

Lack of math function
(exponents, matrices, floating point, randomness)
Costs (Computation and storage)
Speed

[DALLE3]

Running simulations "off chain"

Use blockchain for provenance and authorship
Use traditional techniques for compute

Eg:

RISC0: compile to RISC5 and form arithmetic circuits
Use merkle trees for hashing the call stack

Running simulations "off chain"

Workflow

Running simulations "off chain"

RISC0 approach

Running simulations "off chain"

RISC0 approach

Using merkle trees

Merkle tree calculation. [Source: Wikipedia]

Using merkle trees

Merkle tree with call stack as data nodes.

Using merkle trees

Key aspect: atomic state transitions

Each step in the simulation must be an atomic state transition

Example simulations

Source code on Github

Conclusion & Future Outlook

On-chain vs. off-chain
Future Research Directions:

Specialized (E)VMs for scientific computing
Integration with existing workflows
Explore computational efficiency and costs
Explore Hybrid approaches

Slides: go.fzj.de/ak-derse25-slides

Examples: go.fzj.de/ak-derse25-examples