Think Harder, Sample Once – linear content

I want to find the most interesting papers from ~200 recent AI publications. “Interesting” sounds subjective, but I gave the model specific criteria: prioritize counter-intuitive findings and emergent behaviors, downweight incremental benchmarks, and so on. The question: can I trust a single response, or do I need to ask 30 times and vote?

Below, I compare single-shot (one call, take what you get) against consensus (30 runs, keep papers that appear in >50%). The answer depends on how hard the model thinks.

Results for :

Single-shot

Papers from one sampled run

Top 5 by frequency

Muted = below 50% threshold

▼ All Selections % of 30 runs

* in top X · in single-shot

Single-shot

Top 5 by frequency

▼ All Selections % of 30 runs Sort by frequency Sort by title

▼ All Selections % of 30 runs