EMSQMS on Tuesday, July 20th

09:00‑10:00 Session 1
Location: AT 2.14
09:00 Joe Hurd
Evaluation Opportunities in Mechanized Theories

I will submit a proper talk abstract in the next few days

10:30‑12:30 Session 2
Chair: Geoff Sutcliffe
Location: AT 2.14
10:30 Robert Brummayer, Aaron Stump and Duckki Oe
Exploring Predictability of SAT/SMT Solvers

This paper seeks to explore the predictability of SAT and SMT solvers in response to different kinds of changes to benchmarks. We consider both semantics-preserving and possibly semantics-modifying transformations, and provide preliminary data about solver predictability. We also propose carrying learned theory lemmas over from an original run to runs on similar benchmarks, and show the benefits of this idea as a heuristic for improving predictability of SMT solvers.

11:00 Claudia Peschiera, Luca Pulina and Armando Tacchella
Designing a solver competition: the QBFEVAL'10 case study

In this paper we report about QBFEVAL'10, the seventh in a series of events established with the aim of assessing the advancements in reasoning about quantified Boolean formulas (QBFs). The paper discusses the results obtained and the evaluation setup, from the criteria used to select QBF instances down to the hardware infrastructure. We also discuss the current state-of-the-art in light of past challenges and we envision future research directions that are motivated by the results of QBFEVAL'10.

11:30 Mladen Nikolic
Statistical Methodology for Comparison of SAT Solvers

Evaluating improvements to modern SAT solvers and comparison of two arbitrary solvers is a challenging and important task. Relative performance of two solvers is usually assessed by running them on a set of SAT instances and comparing the number of solved instances and their running time in a straightforward manner. In this paper we point to shortcomings of this approach and advocate more reliable, statistically founded methodologies that could discriminate better between good and bad ideas. We present one such methodology and illustrate its application.

12:00 Christoph Benzmueller and Marvin Schiller
Adaptive Proofs supported by Metadeduction

The extended abstract will soon be added.

14:00‑15:00 Session 3
Chair: Cesare Tinelli
Location: AT 2.14
14:00 Morgan Deters
The SMT Execution Service: Features, Fairness, and the Future

The SMT-Exec service is a benchmark repository, execution service, and competition infrastructure for the SMT community. Besides running the yearly competition and providing real-time (and also archival) results and analysis, SMT-Exec permits "private experiments" to be run year-round by researchers all over the world. These experiments are just like the yearly competition and run on the same computing cluster, but may be parameterized by users to run on benchmark and solver subsets of interest with a configurable timeout. Private solvers may be uploaded and tested against each other or against archival versions of competition solvers, and solvers and experiment results may be "published" so that they are publicly viewable. This talk describes SMT-Exec design highlights, challenges, growing pains, and current development plans for future versions of SMT-Exec.

14:30 Geoff Sutcliffe, Cynthia Chang, Li Ding, Deborah McGuinness and Paulo Pinheiro da Silva
Different Proofs are Good Proofs

In order to compare the quality of proofs, it is necessary to measure artifacts of the proofs, and evaluate the measurements to determine differences between the proofs. This paper discounts the approach of {\em ranking} measurements of proof artifacts, and takes the position that {\em different proofs are good proofs}. The position is based on proofs in the TSTP solution library, which are generated by Automated Theorem Proving (ATP) systems applied to first-order logic problems in the TPTP problem library.

15:30‑17:00 Session 4
Chair: Aaron Stump
Location: AT 2.14
15:30 EMS panel on solver competitions and community infrastructure (shared with SVARM)
16:15 QMS facilitated discussion on quality metrics for solutions