I'm a bit skeptical about taking the rates of reported soundness bugs between different systems and drawing conclusions about the underlying approaches. There's typically a class of bugs that users can stumble into by doing something unusual, and then another class of bugs that can only really be found by exploiting holes in the implementation. The first class depends on which features get exercised the most, the second class depends on how many eyes are on the source (and how accessible it is), and both classes heavily depend on the overall size of the userbase.
E.g., Metamath is designed to be as theoretically simple as possible, to the point that it's widely considered a toy in comparison to 'serious' proof systems: a verifier is mainly just responsible for pushing around symbols and strings. In spite of this simplicity, I was able to find soundness bugs in a couple major verifiers, simply because few people use the project to begin with, and even fewer take the time to pore over the implementations.
So I'd be hesitant to start saying that one approach is inherently more or less bug-prone than another, except to be slightly warier in general of larger or less accessible kernels.
The multithreading bug in Isabelle is fascinating from a systems design perspective: threads racing ahead assuming a stuck proof will succeed, creating false positives. That's exactly the kind of subtle concurrency issue that makes verification tools need... verification tools.
I don't think it's that subtle: it would be enough to have a continuous integration tool that tries to compile the proofs that are checked into version control and raises an error if it fails.
The linked blog post says "Unfortunately, a student once submitted work containing this error; it was almost entirely incorrect and he had no idea." I guess the student probably was aware that not every proof had gone through, and that the that he saw like "99 out of 100 proofs succeeded" and assumed that meant that he had mostly completed the task, not realizing that a false theorem would be used to the give incorrect proofs for the rest of the file.
Indeed this can simply be checked by a command-line invocation.
But I don't think the student was aware: They would only have seen a purple coloring of the "stuck" part, as shown in the linked example in the blog post. And if one assumes that the system only accepts correct proof steps, it's very easy to miss a small error in a theorem statement...
In the described case, this was a simple user error. But you are right nonetheless: To enable the concurrency, the system uses a parallel inference kernel (https://www21.in.tum.de/~wenzelm/papers/parallel-isabelle.pd...). I have heard the anecdote that in the first version, this also allowed one to prove false, by fulfilling a proof promise with a proof that depended recursively on the same promise.
And there are verification tools for verification tools! Isabelle has a verified proof checker that can check whether a given proof is sound: https://www21.in.tum.de/~rosskops/papers/metalogic_pre.pdf
The only caveat is that the proof checker itself is verified in Isabelle...
I'm a bit skeptical about taking the rates of reported soundness bugs between different systems and drawing conclusions about the underlying approaches. There's typically a class of bugs that users can stumble into by doing something unusual, and then another class of bugs that can only really be found by exploiting holes in the implementation. The first class depends on which features get exercised the most, the second class depends on how many eyes are on the source (and how accessible it is), and both classes heavily depend on the overall size of the userbase.
E.g., Metamath is designed to be as theoretically simple as possible, to the point that it's widely considered a toy in comparison to 'serious' proof systems: a verifier is mainly just responsible for pushing around symbols and strings. In spite of this simplicity, I was able to find soundness bugs in a couple major verifiers, simply because few people use the project to begin with, and even fewer take the time to pore over the implementations.
So I'd be hesitant to start saying that one approach is inherently more or less bug-prone than another, except to be slightly warier in general of larger or less accessible kernels.
The multithreading bug in Isabelle is fascinating from a systems design perspective: threads racing ahead assuming a stuck proof will succeed, creating false positives. That's exactly the kind of subtle concurrency issue that makes verification tools need... verification tools.
I don't think it's that subtle: it would be enough to have a continuous integration tool that tries to compile the proofs that are checked into version control and raises an error if it fails.
The linked blog post says "Unfortunately, a student once submitted work containing this error; it was almost entirely incorrect and he had no idea." I guess the student probably was aware that not every proof had gone through, and that the that he saw like "99 out of 100 proofs succeeded" and assumed that meant that he had mostly completed the task, not realizing that a false theorem would be used to the give incorrect proofs for the rest of the file.
Indeed this can simply be checked by a command-line invocation. But I don't think the student was aware: They would only have seen a purple coloring of the "stuck" part, as shown in the linked example in the blog post. And if one assumes that the system only accepts correct proof steps, it's very easy to miss a small error in a theorem statement...
In the described case, this was a simple user error. But you are right nonetheless: To enable the concurrency, the system uses a parallel inference kernel (https://www21.in.tum.de/~wenzelm/papers/parallel-isabelle.pd...). I have heard the anecdote that in the first version, this also allowed one to prove false, by fulfilling a proof promise with a proof that depended recursively on the same promise.
And there are verification tools for verification tools! Isabelle has a verified proof checker that can check whether a given proof is sound: https://www21.in.tum.de/~rosskops/papers/metalogic_pre.pdf The only caveat is that the proof checker itself is verified in Isabelle...