CNRS, LaBRI, Bordeaux,
and The Alan Turing Institute, London
Feynman: What I cannot create, I do not understand.
Scenario: two-step solutions
- Step 1: Learn from the input distribution
- Step 2: Guide the algorithm using the model
: the input distribution is heavily biassed
Perils of data generation
- The input distribution is usually not known!
- Bad data generation leads to bad performances
Even worse, too good performances!
Confucius: When a wise man points at the moon, the imbecile examines the finger.
The model should learn the input distribution, not the data generation.
- Unbiassed data generation algorithms
- Heuristic-based algorithms with provable guarantees
Programming by example
FlashFill (Gulwani, Microsoft Research)
Included in Microsoft Excel 2013!
Correct with one I/O in 60% of the cases!
INPUT: a few* input / output pairs
OUTPUT: synthetise a program
*a few = 5
The domain specific language (DSL)
: Learning to Write Programs (ICLR 2017)
high-level functions operating on lists in [-255,255]
Objective: train a model using (synthetic) programs and I/O
The model takes I/O and predicts which functions appear in the target program
- Remove syntactically inept programs
- Remove (probably) equivalent programs
: given a program, find interesting I/O
- Domain restriction (original DeepCoder approach)
- Non-uniform sampling
- Constraint-based (leveraging z3)
- Semantic variation
Home court / away court