Document 7134164

Download Report

Transcript Document 7134164

From Machine Learning to
Inductive Logic Programming:
ILP made easy
Hendrik Blockeel
Katholieke Universiteit Leuven
Belgium
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
1
Contents of this course
• Introduction
– What is Inductive Logic Programming?
– Relationship with other fields
• Foundations of ILP
• Algorithms
• Applications
Contents and slides in co-operation with Luc De Raedt
of the University of Freiburg, Germany
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
2
1. Introduction
What is inductive logic
programming?
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
3
Introduction: What is ILP?
• Paradigm for inductive reasoning
(reasoning from specific to general)
• Related to
– machine learning and data mining
– logic programming
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
4
Inductive reasoning
• Reasoning from specific to general
– from (specific) observations
– to a (general) hypothesis
• Studied in
– philosophy of science
– statistics
– ...
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
5
This tomato is red
All tomatoes are red
This tomato is also red
• Distinguish:
– weak induction: all observed tomatoes are red
– strong induction: all tomatoes are red
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
6
• Weak induction: conclusion is entailed by
(follows deductively from) observations
• cannot be wrong
• Strong induction: conclusion does not
follow deductively from observations
• could be wrong!
• logic does not provide justification
• probability theory may
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
7
A predicate logic approach
• Different kinds of reasoning in first order
predicate logic
• Standard example: Socrates
Human(Socrates)
Mortal(Socrates)
Deduction
Mortal(x) Human(x)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
8
Human(Socrates)
Mortal(Socrates)
Induction
(generalise
from observed
facts)
Mortal(x) Human(x)
Human(Socrates)
Mortal(Socrates)
Abduction
(suggest
cause)
5/19/2016
Mortal(x) Human(x)
ILP made easy -- ESSLLI 2000,
Birmingham
9
• Logic programming focuses on deduction
• Other types of LP:
– abductive logic programming (ALP)
– inductive logic programming (ILP)
• 2 questions to be solved:
– How to perform induction?
– How to integrate it in logic programming?
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
10
Some examples
• Learning a definition of “member” from
examples
member(a, [a,b,c]).
member(b,[a,b,c]).
member(3,[5,4,3,2,1]).
:- member(b, [1,2,3]).
:- member(3, [a,b,c]).
member(X, [X|Y]).
member(X, [Y|Z]) :- member(X,Z).
Examples
5/19/2016
Hypothesis
ILP made easy -- ESSLLI 2000,
Birmingham
11
Some examples
• Use of background knowledge
• E.g., learning quicksort
qsort([b,c,a], [a,b,c]).
qsort([], []) .
qsort([5,3],[3,5]).
:- qsort([5,3],[5,3]).
:- qsort([1,3] [3]).
split(L, A, B) :- ...
append(A,B,C) :- ...
5/19/2016
qsort([], []).
qsort([X], [X]).
qsort(X,Y) :split(X, A, B),
qsort(A, AS),
qsort(B, BS),
append(AS, BS, Y).
ILP made easy -- ESSLLI 2000,
Birmingham
12
Some examples
• Not only predicate definitions can be
learned; e.g.: learning constraints
parent(jack,mary).
parent(mary,bob).
father(jack,mary).
mother(mary,bob).
male(jack).
male(bob).
female(mary).
5/19/2016
:- male(X), female(X).
male(X) :- father(X,Y).
father(X,Y); mother(X,Y) :- parent(X,Y).
…
ILP made easy -- ESSLLI 2000,
Birmingham
13
Practical applications
• Program synthesis
– very hard
– subtasks: debugging, validation, …
• Machine learning
– e.g., learning to play games
• Data mining
– mining in large amounts of structured data
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
14
Example Application:
Mutagenicity Prediction
• Given a set of molecules
• Some cause mutation in DNA (these are
mutagenic), others don’t
• Try to distinguish them on basis of
molecular structure
• Srinivasan et al., 1994: found “structural
alert”
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
15
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
16
Example Application:
Pharmacophore Discovery
• Application by Muggleton et al., 1996
• Find "pharmacophore" in molecules
– = identify substructure that causes it to "dock"
on certain other molecules
• Molecules described by listing for each
atom in it: element, 3-D coordinates, ...
• Background defines euclidean distance, ...
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
17
• Some example molecules:
5/19/2016
(Muggleton et al. 1996)
ILP made easy -- ESSLLI 2000,
Birmingham
18
Description of molecules:
Background knowledge:
atm(m1,a1,o,2,3.430400,-3.116000,0.048900).
atm(m1,a2,c,2,6.033400,-1.776000,0.679500).
atm(m1,a3,o,2,7.026500,-2.042500,0.023200).
...
bond(m1,a2,a3,2).
bond(m1,a5,a6,1).
bond(m1,a2,a4,1).
bond(m1,a6,a7,du).
...
-> Hypothesis:
5/19/2016
...
hacc(M,A):- atm(M,A,o,2,_,_,_).
hacc(M,A):- atm(M,A,o,3,_,_,_).
hacc(M,A):- atm(M,A,s,2,_,_,_).
hacc(M,A):- atm(M,A,n,ar,_,_,_).
zincsite(M,A):atm(M,A,du,_,_,_,_).
hdonor(M,A) :atm(M,A,h,_,_,_,_),
not(carbon_bond(M,A)), !.
...
active(A) :- zincsite(A,B), hacc(A,C), hacc(A,D), hacc(A,E),
dist(A,C,B,4.891,0.750), dist(A,C,D,3.753,0.750), dist(A,
C,E,3.114,0.750), dist(A,D,B,8.475,0.750), dist(A,D,E,
ILP made easy
-- ESSLLI 2000,
19
2.133,0.750),
dist(A,E,B,7.899,0.750).
Birmingham
Learning to play strategic games
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
20
Advantages of ILP
• Advantages of using first order predicate
logic for induction:
– powerful representation formalism for data and
hypotheses (high expressiveness)
– ability to express background domain
knowledge
– ability to use powerful reasoning mechanisms
• many kinds of reasoning have been studied in a first
order logic framework
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
21
Foundations of
Inductive Logic Programming
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
22
Overview
• Concept learning: the Versionspaces
approach
– from machine learning
– how to search for a concept definition consistent with
examples
– based on notion of generality
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
23
• Notions of generality in ILP
– the theta-subsumption ordering
– other generality orderings
– basic techniques and algorithms
• Representation of data
– two paradigms: learning from implications,
learning from interpretations
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
24
Concept learning
• Given:
– an instance space
– some unknown concept = subset of instance
space
• Task: learn concept definition from
examples (= labelled instances)
– Could be defined extensionally or intensionally
– Usually interested in intensional definition
• otherwise no generalisation possible
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
25
• Hypothesis h = concept definition
– can be represented intensionally : h
– or extensionally (as set of examples) : ext(h)
• Hypothesis h covers example e iff eext(h)
• Given a set of (positive and negative)
examples E = <E+, E->, h is consistent with
E if E+ext(h) and ext(h)E- = 
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
26
Versionspaces
• Given a set of instances E and a hypothesis
space H, the versionspace is the set of all
hH consistent with E
– contains all hypotheses in H that might be the
correct target concept
• Some inductive algorithms exist that, given
H and E, compute the versionspace
VS(H,E)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
27
Properties
• If target concept cH, and E contains no
noise, then cVS(H,E)
– If VS(H,E) is singleton : one solution
– Usually multiple solutions
• If H = 2I with I instance space:
– i.e., all possible concepts in H
– then : no generalisation possible
– H is called inductive bias
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
28
• Usually illustrated with conjunctive concept
definitions
• Example : from T. Mitchell, 1996: Machine
Learning
Sky AirTemp
sunny warm
…
…
5/19/2016
Humidity Wind Water Forecast EnjoySport
normal strong warm same
yes
…
…
…
…
…
ILP made easy -- ESSLLI 2000,
Birmingham
29
Lattice for Conjunctive Concepts
<?,?,?,?,?,?>
<Sunny,?,?,?,?,?>
<?,Warm,?,?,?,?>
...
...
...
...
...
...
...
<?,?,?,?,?,Same>
...
...
...
...
...
...
<Sunny,Warm,Normal,Strong,Warm,Same>
...
<, , , , , >
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
30
• Concept represented as if-then-rule:
– <Sunny,Warm,?,?,?,?>
– IF Sky=sunny AND AirTemp=warm
THEN EnjoySports=yes
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
31
Generality
• Central to versionspace algorithms is notion
of generality
– h is more general than h’ ( h  h’ ) iff
ext(h’)ext(h)
• Properties of VS(H,E) w.r.t. generality:
– if sVS(H,E), gVS(H,E) and g  h  s, then
hVS(H,E)
– => VS can be represented by its borders
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
32
Candidate Elimination Algorithm
• Start with general border G = {all} and
specific border S = {none}
• When encountering positive example e:
– generalise hypotheses in S that do not cover e
– throw away hypotheses in G that do not cover e
• When encountering negative example e:
– specialise hypotheses in G that cover e
– throw away hypotheses in S that cover e
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
33
<?,?,?>
G
<s,?,?> <c,?,?> <r,?,?> <?,w,?> <?,c,?> <?,?,n><?,?,d>
sw? sc? s?n s?d cw? cc? c?n c?d rw? rc? r?n r?d ?wn ?wd ?cn ?cd
swn swd scn scd
cwn cwd ccn ccd
<,,>
5/19/2016
rwn rwd rcn rcd
S
ILP made easy -- ESSLLI 2000,
Birmingham
34
<?,?,?> G
<c,w,n>: +
<s,?,?> <c,?,?> <r,?,?> <?,w,?> <?,c,?> <?,?,n><?,?,d>
sw? sc? s?n s?d cw? cc? c?n c?d rw? rc? r?n r?d ?wn ?wd ?cn ?cd
swn swd scn scd
cwn cwd ccn ccd
rwn rwd rcn rcd
S
<,,>
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
35
<c,w,n>: +
<c,c,d> : -
<?,?,?>
G
G
<s,?,?> <c,?,?> <r,?,?> <?,w,?> <?,c,?> <?,?,n><?,?,d>
sw? sc? s?n s?d cw? cc? c?n c?d rw? rc? r?n r?d ?wn ?wd ?cn ?cd
swn swd scn scd
cwn cwd ccn ccd
rwn rwd rcn rcd
S
<,,>
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
36
• Keeping G and S may not be feasible
– exponential size
• In practice, most inductive concept learners
do not identify VS but just try to find one
hypothesis in VS
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
37
Importance of generality for
induction
• Even when not VS itself, but only one
element of it is computed, generality can be
used for search
– properties allow to prune search space
• if h covers negatives, then any g  h also covers
negatives
• if h does not cover some positives, then any s  h
does not cover those positives either
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
38
• For concept learning in ILP, we will need a
generality ordering between hypotheses
• ILP is not only useful for learning concepts,
but in general for learning theories (e.g.,
constraints)
– then we need generality ordering for theories
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
39
Concept Learning in
First Order Logic
• Need a notion of generality (cf.
versionspaces)
– -subsumption, entailment, …
• How to specialise / generalise concept
definitions?
– operators for specialisation / generalisation
– inverse resolution, least general generalisation
under -subsumption, …
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
40
Generality of theories
• A theory G is more general than a theory S if
and only if G |= S
– G |= S: in every interpretation (set of facts) for
which G is true, S is also true
– "G logically implies S"
– e.g., "all fruit tastes good" |= "all apples taste
good" (assuming apples are fruit)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
41
• Note: talking about theories, not just
concepts (<-> versionspaces)
– generality of concepts is special case of this
• This will allow us to also learn e.g.
constraints, instead of only predicate
definitions (= concept definitions)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
42
Deduction, induction and
generality
• Deduction = reasoning from general to
specific
– is "always correct", = truth-preserving
• Induction = reasoning from specific to
general = inverse of deduction
– not truth-preserving (“falsity-preserving”)
– there may be statistical evidence
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
43
• Deductive operators "|-" exist that implement
(or approximate) |=
• E.g., resolution (from logic programming)
• Inverting these operators yields inductive
operators
– basic technique in many inductive logic
programming systems
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
44
Various frameworks for generality
• Depending on form of G and S
– 1 clause / set of clauses / any first order theory
• Depending on choice of |- to invert
– theta-subsumption
– resolution
– implication
• Some frameworks much easier than others
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
45
1) -subsumption (Plotkin)
• Most often used in ILP
• S and G are single clauses
• c1 -subsumes c2 (denoted c1 c2 ) if and
only if there exists a variable substitution 
such that c1  c2
– to check this, first write clauses as disjunctions
• a,b,c  d,e,f 
a  b  c  d  e  f
– then try to replace variables with constants or
other variablesILP made easy -- ESSLLI 2000,
5/19/2016
Birmingham
46
• Example:
– c1 = father(X,Y) :- parent(X,Y)
– c2 = father(X,Y) :- parent(X,Y), male(X)
• for  ={} : c1  c2 => c1 -subsumes c2
– c3 = father(luc,Y) :- parent(luc,Y)
• for  ={X/luc} : c1 =c3 => c1 -subsumes c3
– c2 and c3 do not -subsume one another
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
47
• Given facts for parent, male, female, …
– so-called background knowledge B
• Clause produces a set of father facts
– answer substitutions for X,Y when body
considered as query
– or: facts occurring in minimal model of
Bclause
– set = extensional definition of concept “father”
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
48
• Property :
– If
• c1 and c2 are definite Horn clauses
• c1  c2
– Then
• facts produced by c2  facts produced by c1
• (Easy to see from definition -subsumption)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
49
• Similarity with propositional refinement
– IF Sky = sunny THEN EnjoySports=yes
– To specialise: add 1 condition
• IF Sky=sunny AND Humidity=low THEN
EnjoySports=yes
• ...
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
50
• In first order logic:
– c1: father(X,Y) :- parent(X,Y)
– To specialize: find clauses -subsumed by c1
• father(X,Y) :- parent(X,Y), male(X)
• father(luc,X) :- parent(luc,X)
•…
– = add literals or instantiate variables
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
51
• Another (slightly more complicated)
example:
– c1 = p(X,Y) :- q(X,Y)
– c2 = p(X,Y) :- q(X,Y), q(Y,X)
– c3 = p(Z,Z) :- q(Z,Z)
– c4 = p(a,a) :- q(a,a)
• Which clauses -subsumed by which?
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
52
• Properties of -subsumption:
– Sound:
• if c1 -subsumes c2 then c1 |= c2
– Incomplete: possibly c1 |= c2 without c1 subsuming c2 (but only for recursive clauses)
• c1 : p(f(X)) :- p(X)
• c2 : p(f(f(X))) :- p(X)
– Hence: -subsumption approximates entailment
but is not the same
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
53
– Checking whether c1 -subsumes c2 is
decidable but NP-complete
– Transitive and reflexive, not anti-symmetric
• "semi-order" relation
• e.g.:
– f(X,Y) :- g(X,Y), g(X,Z)
– f(X,Y) :- g(X,Y)
– both -subsume one another
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
54
• Semi-order generates equivalence classes +
partial order on those equivalence classes
– equivalence class: c1 ~ c2 iff c1  c2 and c2  c1
• c1 and c2 are then called syntactic variants
• c1 is reduced clause of c2 iff c1 contains minimal
subset of literals of c2 that is still equivalent with c2
• each equivalence class represented by its reduced
clause
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
55
• If c1 and c2 in different equivalence classes,
either c1  c2 or c2  c1 or neither => antisymmetry => partial order
• Thus, reduced clauses are partially ordered
– they form a lattice
– properties of this lattice?
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
56
lgg
p(X,Y) :- m(X,Y)
p(X,Y) :- m(X,Y), m(X,Z)
p(X,Y) :- m(X,Y), m(X,Z), m(X,U)
...
p(X,Y) :- m(X,Y),r(X)
p(X,Y) :- m(X,Y), m(X,Z),r(X)
p(X,Y) :- m(X,Y),s(X)
p(X,Y) :- m(X,Y), m(X,Z),s(X)
...
...
reduced
glb
5/19/2016
p(X,Y) :- m(X,Y),s(X),r(X)
p(X,Y) :- m(X,Y), m(X,Z),s(X),r(X)
...
ILP made easy -- ESSLLI 2000,
Birmingham
57
• Least upper bound / greatest lower bound of
two clauses always exists and is unique
• Infinite chains c1  c2  c3  ...  c exist
– h(X) :- p(X,Y)
– h(X) :- p(X,X2), p(X2,Y)
– h(X) :- p(X,X2), p(X2,X3), p(X3,Y)
– ...
– h(X) :- p(X,X)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
58
• Looking for good hypothesis = traversing
this lattice
– can be done top-down, using specialization
operator
– or bottom-up, using generalization operator
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
59
top
Heuristics-based searches
(greedy, beam, exhaustive…)
VS
bottom
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
60
Specialisation operators
• Shapiro: general-to-specific traversal using
refinement operator :
– (c) yields set of refinements of c
– theory: (c) = {c' | c' is a maximally general
specialisation of c}
– practice: (c)  {c  {l} | l is a literal}  {c | 
is a substitution}
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
61
daughter(X,Y)
daughter(X,X)
daughter(X,Y) :- parent(X,Z)
......
daughter(X,Y) :- female(X)
daughter(X,Y) :- parent(Y,X)
...
daughter(X,Y):-female(X),female(Y)
5/19/2016
daughter(X,Y):-female(X),parent(Y,X)
ILP made easy -- ESSLLI 2000,
Birmingham
62
• How to traverse hypothesis space so that
– no hypotheses are generated more than once?
– no hypotheses are skipped?
• -> Many properties of refinement operators
studied in detail
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
63
• Some properties:
– globally complete: each point in lattice is
reachable from top
– locally complete: each point directly below c is
in (c) (useful for greedy systems)
– optimal: no point in lattice is reached twice
(useful for exhaustive systems)
– minimal, proper, …
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
64
A generalisation operator
• For bottom-up search
• We discuss one generalisation operator:
Plotkin’s lgg
– Starts from 2 clauses and compute least general
generalisation (lgg)
– i.e., given 2 clauses, return most specific single
clause that is more general than both of them
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
65
• Definition of lgg of terms:
– (let si, tj denote any term, V a variable)
– lgg(f(s1,...,sn), f(t1,...,tn)) =
f(lgg(s1,t1),...,lgg(sn,tn))
– lgg(f(s1,...,sn),g(t1,...,tn)) = V
• e.g.: lgg(a,b) = X; lgg(f(X),g(Y)) = Z;
lgg(f(a,b,a),f(c,c,c))=f(X,Y,X); …
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
66
• lgg of literals:
– lgg(p(s1,...,sn),p(t1,...,tn)) =
p(lgg(s1,t1),...,lgg(sn,tn))
– lgg(p(...),  p(...)) =  lgg(p(...),p(...))
– lgg(p(s1,...,sn),q(t1,...,tn)) is undefined
– lgg(p(...), p(...)) and lgg(p(...),p(...)) are
undefined
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
67
• lgg of clauses:
– lgg(c1,c2) = {lgg(l1, l2) | l1c1, l2c2 and
lgg(l1,l2) defined}
• Example:
– f(t,a) :- p(t,a), m(t), f(a)
– f(j,p) :- p(j,p), m(j), m(p)
– lgg = f(X,Y) :- p(X,Y), m(X), m(Z)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
68
• Relative lgg (rlgg) (Plotkin 1971)
– relative to "background theory" B (assume B is a
set of facts)
– rlgg(e1,e2) = lgg(e1 :- B, e2 :- B)
– method to compute:
• change facts into clauses with body B
• compute lgg of clauses
• remove B, reduce
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
69
Example: “Bongard problems”
• Bongard: Russian scientist studying pattern
recognition
• Given some pictures, find patterns in them
• Simplified version of Bongard problems
used as benchmarks in ILP
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
70
Examples labelled “neg”
Examples labelled “pos”
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
71
• Example: 2 simple Bongard problems, find
least general clause that would predict both
to be positive
pos(1).
contains(1,o1).
contains(1,o2).
triangle(o1).
points(o1,down).
circle(o2).
1
2
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
pos(2).
contains(2,o3).
triangle(o3).
points(o3,down).
72
• Method 1: represent example by clause;
compute lgg of examples
pos(1) :- contains(1,o1),
contains(1,o2), triangle(o1),
points(o1,down), circle(o2).
pos(2) :- contains(2,o3), triangle(o3),
points(o3,down).
lgg( (pos(1) :- contains(1,o1), contains(1,o2), triangle(o1),
points(o1,down), circle(o2)) ,
(pos(2) :- contains(2,o3), triangle(o3), points(o3, down) )
= pos(X) :- contains(X,Y), triangle(Y), points(Y,down)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
73
• Method 2: represent class of example by
fact, other properties in background;
compute rlgg
Examples:
pos(1).
pos(2).
Background:
contains(1,o1).
contains(1,o2).
triangle(o1).
points(o1,down).
circle(o2).
contains(2,o3).
triangle(o3).
points(o3,down).
rlgg(pos(1), pos(2)) = ? (exercise)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
74
• -subsumption ordering used by many ILP
systems
– top down: using refinement operators (many
systems)
– bottom up: using rlgg (e.g., Golem system,
Muggleton & Feng)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
75
• Note: inverting implication
– Given the incompleteness of -subsumption,
could we invert implication?
– Some problems:
• lgg under implication not unique; e.g., lgg of
p(f(f(f(X)))):-p(X) and p(f(f(X))):-p(X) can be
p(f(X)):-p(X) or p(f(f(X))):-p(Y)
• computationally expensive
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
76
2) Inverting resolution
• Resolution rule for deduction:
Propositional:
First order:
pq
qr
----------------pr
p(X)  q(X)
q(X)  r(X,Y)
----------------------------------------p(X)   r(X,Y)
pq qs
----------------p s
p(a)  q(b)
q(X)  r(X,Y)
---------------------------------------p(a)  r(b,Y)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
{X/b}
77
Inverting resolution
• General resolution rule:
2 opposite literals (up to a substitution) : li1 = kj2
l1  ...  li  ...  ln
k1  ...  kj  ...  km
------------------------------------------------------------------------------(l1  l2  ...  li-1  li+1  ...  ln  k1  kj-1  kj+1 ...  km) 12
e.g., p(X) :- q(X) and q(X) :- r(X,Y) yield p(X) :- r(X,Y)
p(X) :- q(X) and q(a) yield p(a).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
78
• Resolution implements |- for sets of clauses
– cf. -subsumption: for single clauses
• Inverting it allows to generalize a clausal
theory
• Inverse resolution is much more difficult
than resolution itself
– different operators defined
– no unique results
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
79
Inverse resolution operators
• Some operators related to inverse resolution:
(A and B are conjunctions of literals)
– absorption:
p :- q,B
• from q:-A and p :- A,B
• infer p :- q,B
– identification:
q :- A
p :- A,B
p :- q,B
q :- A
• from p :- q,B and p :- A,B
p :- A,B
• infer q :- A
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
80
– Intra-construction:
• from p :- A,B and p :- A,C
• infer q :- B and p :- A,q and q :- C
– Inter-construction:
• from p :- A,B and q :- A,C
• infer p :- r,B and r :- A and q :- r,C
q:-B
p:-A,q
q:-C
p:-r,B
r :- A
q:-r,C
inter
intra
p:-A,B
5/19/2016
p:-A,C
p:-A,B
ILP made easy -- ESSLLI 2000,
Birmingham
q:-A,C
81
• With intra- and inter-construction, new
predicates are “invented”
• E.g., apply intra-construction on
– grandparent(X,Y) :- father(X,Z), father(Z,Y)
– grandparent(X,Y) :- father(X,Z), mother(Z,Y)
• What predicate is invented?
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
82
Example inverse resolution
m(j)
f(X,Y) :- p(X,Y),m(X)
f(j,Y) :- p(j,Y)
f(j,m)
p(j,m)
grandparent(X,Y) :- father(X,Z), parent(Z,Y)
father(X,Y) :- male(X), parent(X,Y)
grandparent(X,Y) :- male(X), parent(X,Z), parent(Z,Y)
male(jef)
grandparent(jef,Y) :- parent(jef,Z),parent(Z,Y)
parent(jef,an)
grandparent(jef,Y) :- parent(an,Y)
parent(an,paul)
grandparent(jef,paul)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
84
• Properties of inverse resolution:
– + in principle very powerful
– - gives rise to huge search space
– - result of inverse resolution not unique
• e.g., father(j,p):-male(j) and parent(j,p) yields
father(j,p):-male(j),parent(j,p) or father(X,Y):male(X),parent(X,Y) or …
• CIGOL approach (Muggleton & Buntine)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
85
• We now have some basic operators:
– -subsumption-based: at single clause level
• specialization operator: 
• generalization operator : lgg of 2 clauses
– inverse resolution: generalize a set of clauses
• These can be used to build ILP systems
– top-down: using specialization operators
–
bottom-up:
using
generalization
operators
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
86
Representations
• 2 main paradigms for learning in ILP:
– learning from interpretations
– learning from entailment
• Related to representation of examples
• Cf. Bongard examples we saw before
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
87
Learning from entailment
• 1 example = a fact e (or clause e:-B)
• Goal:
– Given examples <E+,E->,
– Find theory H such that
• e+E+: BH |- e+
• e-E-: BH |- e-
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
88
Examples:
Background:
5/19/2016
pos(1).
pos(2).
:- pos(3).
contains(1,o1).
contains(1,o2).
contains(2,o3).
triangle(o1).
triangle(o3).
points(o1,down).
points(o3,down).
circle(o2).
contains(3,o4).
circle(o4).
ILP made easy -- ESSLLI 2000,
Birmingham
pos(X) :- contains(X,Y),
triangle(Y),
points(Y,down).
89
Learning from interpretations
• Example = interpretation (set of facts) e
– contains a full description of the example
– all information that intuitively belongs to the
example, is represented in the example, not in
background knowledge
• Background = domain knowledge
– general information concerning the domain, not
concerning specific examples
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
90
Examples:
pos(1) :- contains(1,o1), contains(1,o2), triangle(o1),
points(o1,down), circle(o2).
pos(2) :- contains(2,o3), triangle(o3), points(o3,down).
:- pos(3), contains(3,o4), circle(o4).
Background:
polygon(X) :- triangle(X).
polygon(X) :- square(X).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
pos(X) :- contains(X,Y),
triangle(Y),
points(Y,down).
91
Closed World Assumption
made inside interpretations
Examples:
pos: {contains(o1), contains(o2), triangle(o1),
points(o1,down), circle(o2)}
pos: {contains(o3), triangle(o3), points(o3,down)}
neg: {contains(o4), circle(o4)}
Background:
polygon(X) :- triangle(X).
polygon(X) :- square(X).
constraint on pos
Y:contains(Y),triangle(Y),points(Y,down).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
92
• Note: when learning from interpretations
– can dispose of “example identifier”
• but can also use standard format
– CWA made for example description
• i.e., example description is assumed to be complete
– class of example related to information inside
example + background information, NOT to
information in other examples
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
93
• Because of 3rd property, more limited than
learning from entailment
– cannot learn relations between different
examples, nor recursive clauses
• … but also more efficient
– because of 2nd and 3rd property
– positive PAC-learnability results (De Raedt and
Džeroski, 1994, AIJ), vs. negative results for
learning fromILPentailment
5/19/2016
made easy -- ESSLLI 2000,
Birmingham
94
Algorithms
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
95
Rule induction
• Most inductive logic programming systems
induce concept definition in form of set of
definite Horn clauses (Prolog program)
• Many algorithms similar to propositional
algorithms for learning rule sets
– FOIL -> CN2
– Progol -> AQ
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
96
FOIL (Quinlan)
• Learns single concept, e.g., p(X,Y) :- ...
• To learn one clause: (hill-climbing search)
– start with general clause p(X,Y) :- true
– repeat
• add “best” literal to clause (i.e., literal that most
improves quality of clause)
• new literal can also be unification: X=c or X=Y
• = applying refinement operator under -subsumption
– until no further improvement
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
97
Example
father(homer,bart).
father(bill,chelsea).
:- father(marge,bart).
:- father(hillary,chelsea).
:- father(bart,chelsea).
parent(homer,bart).
parent(marge,bart).
parent(bill,chelsea).
parent(hillary,chelsea)
male(homer).
male(bart).
male(bill).
female(chelsea).
female(marge).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
98
father(homer,bart).
father(bill,chelsea).
:- father(marge,bart).
:- father(hillary,chelsea).
:- father(bart,chelsea).
parent(homer,bart).
parent(marge,bart).
parent(bill,chelsea).
parent(hillary,chelsea).
male(homer).
male(bart).
male(bill).
female(chelsea).
female(marge).
5/19/2016
father(X,Y) :- parent(X,Y).
father(X,Y) :- parent(Y,X).
father(X,Y) :- male(X).
father(X,Y) :- male(Y).
father(X,Y) :- female(X).
father(X,Y) :- female(Y).
ILP made easy -- ESSLLI 2000,
Birmingham
2+,2-
99
father(homer,bart).
father(bill,chelsea).
:- father(marge,bart).
:- father(hillary,chelsea).
:- father(bart,chelsea).
parent(homer,bart).
parent(marge,bart).
parent(bill,chelsea).
parent(hillary,chelsea).
male(homer).
male(bart).
male(bill).
female(chelsea).
female(marge).
5/19/2016
father(X,Y) :- parent(X,Y).
father(X,Y) :- parent(Y,X).
father(X,Y) :- male(X).
father(X,Y) :- male(Y).
father(X,Y) :- female(X).
father(X,Y) :- female(Y).
2+,1-
ILP made easy -- ESSLLI 2000,
Birmingham
100
father(homer,bart).
father(bill,chelsea).
:- father(marge,bart).
:- father(hillary,chelsea).
:- father(bart,chelsea).
parent(homer,bart).
parent(marge,bart).
parent(bill,chelsea).
parent(hillary, chelsea).
male(homer).
male(bart).
male(bill).
female(chelsea).
female(marge).
5/19/2016
[father(X,Y) :- male(X).]
father(X,Y) :- male(X), parent(X,Y).
father(X,Y) :- male(X), parent(Y,X).
father(X,Y) :- male(X), male(Y).
father(X,Y) :- male(X), female(X).
father(X,Y) :- male(X), female(Y).
2+,0-
ILP made easy -- ESSLLI 2000,
Birmingham
101
Learning multiple clauses: the
“Covering” approach
• To learn multiple clauses:
– repeat
• learn a single clause c (see previous algorithm)
• add c to h
• mark positive examples covered by c as “covered”
– until
• all positive examples marked “covered”
• or no more good clauses found
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
102
likes(garfield, lasagne).
likes(garfield, birds).
likes(garfield, meat).
likes(garfield, X) :- edible(X).
likes(garfield, jon).
likes(garfield, odie).
…
…
3+,0-
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
103
(italics: previously covered)
likes(garfield, lasagne).
likes(garfield, birds).
likes(garfield, meat).
likes(garfield, jon).
likes(garfield, odie).
…
…
5/19/2016
likes(garfield, X) :- edible(X).
likes(garfield, X) :subject_to_cruelty(X).
ILP made easy -- ESSLLI 2000,
Birmingham
2+,0-
104
Some pitfalls
• Avoiding infinite recursion:
– when recursive clauses allowed, e.g.,
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y)
– avoid learning parent(X,Y) :- parent(X,Y)
• won't be useful, even though it's 100% correct
• Bonus for introduction of new variables:
– literal may not yield any direct gain, but may
introduce variables that may be useful later
p(X) :- q(X)
refine by adding age:
5/19/2016
p(X)
:- q(X), age(X,Y)
p positives, n negatives covered
ILP made
easy -- ESSLLIn2000,
p positives,
negatives
Birmingham
covered -> no105
gain
Golem (Muggleton & Feng)
• Based on rlgg-operator
• To build one clause:
– Look at 2 positive examples, find rlgg, generalize
using yet another example, … until no
improvement in quality of clause
– = bottom-up search
• Result very dependent on choice of examples
– e.g. what if true theory is {p(X) :- q(X) , p(X) :r(X)} ?
5/19/2016
ILP made easy -- ESSLLI 2000,
106
Birmingham
• Try this for different couples, pick best
clause found
– this reduces dependency on choice of couple (if
1 of them noisy : no good clause found)
• Remove covered positive examples, restart
process
• Repeat until no more good clauses found
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
107
• 1 limitation of Golem: extensional coverage
tests
– only extensional background knowledge
– may go wrong when learning recursive clauses
p(0).
p(1).
p(2).
s(0,1).
s(1,2).
s(2,3).
s(3,4).
:- p(4).
examples
5/19/2016
background
induces
p(Y) :- s(X,Y), p(X).
H:-B checked by running
query (B  H)
= extensional coverage test
ILP made easy -- ESSLLI 2000,
Birmingham
108
Progol (Muggleton)
• Top-down approach, but with “seed”
• To find one clause:
– Start with 1 positive example e
– Generate hypothesis space He that contains only
hypotheses that cover at least this one example
• first generate most specific clause c that covers e
• He contains every clause more general than c
– Perform exhaustive top-down search in He,
looking for clause that maximizes compaction
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
109
– Compaction = size(covered examples) size(clause)
• Repeat process of finding one clause until
no more good (= causing compaction)
clauses found
• Compaction heuristic in principle allows no
coverage of negatives
– can be relaxed (accommodating noise)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
110
Generation of bottom clause
• Language bias = set of all acceptable
clauses (chosen by user)
– = specification of H (on level of single clauses)
• Bottom clause  for example e = most
specific clause in language bias covering e
• Constructed using “inverse entailment”
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
111
• Construction of :
– if BH |= e, then B  e |= H
– if H is clause, H is conjunction of ground
(skolemized) literals
– compute  : all ground literals entailed by
B  e
– H must be subset of these
– so B  e |=  |= H
– hence H |= 
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
112
• Some examples (cf. Muggleton, NGC 1995)
B
anim(X) :- pet(X).
pet(X) :- dog(X).
hasbeak(X) :- bird(X).
bird(X) :- vulture(X).
5/19/2016

e
nice(X) :- dog(X).
hasbeak(tweety).
nice(X) :- dog(X), pet(X),
anim(X).
hasbeak(tweety); bird(tweety);
vulture(tweety).
ILP made easy -- ESSLLI 2000,
Birmingham
113
• Example of (part of) Progol run
– learn to classify animals as mammals, reptiles, ...
|- generalise(class/2)?
[Generalising class(dog,mammal).]
[Most specific clause is]
class(A,mammal) :- has_milk(A), has_covering(A,hair), has_legs(A,
4), homeothermic(A), habitat(A,land).
[C:-28,4,10,0 class(A,mammal).]
[C:8,4,0,0 class(A,mammal) :- has_milk(A).]
[C:5,3,0,0 class(A,mammal) :- has_covering(A,hair).]
[C:-4,4,3,0 class(A,mammal) :- homeothermic(A).]
[4 explored search nodes]
f=8,p=4,n=0,h=0
[Result of search is]
class(A,mammal) :- has_milk(A).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
114
• Exhaustive search : important to constrain
size of hypothesis space
• Strong language bias
– specify which predicates to be used in head or
body of clause
– specify types and modes of predicates
• e.g., allow: age(X,Y), Y<18
• but not: habitat(X,Y), Y<18
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
115
• E.g., for "animals" example:
put this in head
put this
in body
variable of type "animal"
:- modeh(1,class(+animal,#class))?
:- modeb(1,has_milk(+animal))?
:- modeb(1,has_gills(+animal))?
:- modeb(1,has_covering(+animal,#covering))?
:- modeb(1,has_legs(+animal,#nat))?
:- modeb(1,homeothermic(+animal))?
:- modeb(1,has_eggs(+animal))?
:- modeb(*,habitat(+animal,#habitat))?
constant of
type "covering"
there can be any number of habitats
only one literal of this kind needed
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
116
Other approaches
• Algorithms we have seen up till now are rule
based algorithms
– induce theory in the form of a set of rules
(definite Horn clauses)
– induce rules one by one
• Quite normal, given that logic programs are
essentially sets of rules…
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
117
• Still: induction of rule sets is only one type
of machine learning
• Difference between ILP and propositional
approaches is mainly in representation
• Possible to define other learning techniques
and tasks in ILP: induction of constraints,
induction of decision trees, Bayesian
learning, ...
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
118
Claudien (De Raedt & Bruynooghe)
• "Clausal Discovery Engine"
• Discovers patterns that hold in set of data
– any patterns represented as clauses (not
necessarily Horn clauses)
• I.e., finds patterns of a more general kind than
predictive rules
• also called descriptive induction
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
119
• Given a hypothesis space:
– performs an exhaustive top-down search
through the space
– returns all clauses that
• hold in the data set
• are not implied by other clauses found
• Strong language bias : precise syntactical
description of acceptable clauses
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
120
• Example language bias:
{parent(X,Y), father(X,Y), mother(X,Y)} :{parent(X,Y), father(X,Y), mother(X,Y),
male(X), male(Y), female(X), female(Y)}
May result in following clauses being discovered:
parent(X,Y) :- father(X,Y).
parent(X,Y) :- mother(X,Y).
:- father(X,Y), mother(X,Y).
:- male(X), female(X).
mother(X,Y) :- parent(X,Y), female(X).
...
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
121
Claudien algorithm
• S := 
• Q := {}
• while Q not empty
– pick first clause c from Q
– for all (hb) in (c) :
• if query (bh) fails (i.e., clause is true in data)
• then
– if (hb) not entailed by clauses in S then add (hb) to S
• else add (hb) to Q
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
122
ICL (De Raedt and Van Laer)
• “Inductive Constraint Logic”
• First system to learn from interpretations
• Search for constraints on interpretations
distinguishing examples of different classes
– Roughly: run Claudien on set of examples E+
• each constraint found will be true for all e+, but
probably false for some e• all constraints together hopefully rule out all e5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
123
• Search for one constraint:
– c := ;
– repeat until c true for all positives:
• find d in (c) so that d holds for as many positives
and as few negatives as possible
• c := d
– add c to h
• can also use beam search
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
124
• Search for set of constraints on a class:
– h := {};
– while there are negatives left to be eliminated:
• find a constraint c
• add c to h
• Uses same language bias (“DLAB”) as
recent versions of Claudien
• DLAB is advanced form of original Claudien bias
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
125
• Example of DLAB bias specification:
– min-max: [...] means at least min and at most
max literals from the list are to be put here
– can be nested
– allows some nice tricks, e.g.:
• 1-1:[male(X),female(X)]
0-2:[parent(X,Y), father(X,Y), mother(X,Y)]
<-0-len:[parent(X,Y), father(X,Y), mother(X,Y),
male(X), male(Y), female(X), female(Y)]
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
126
Warmr (Dehaspe)
• Induces “first order association rules”
• Algorithm similar to APRIORI
– Finds frequent patterns
• cf. "frequent item sets" in APRIORI context
• Pattern = conjunction of literals
• Uses -subsumption lattice over hypothesis space
– Constructs association rules from patterns
• IF this pattern occurs, THEN that pattern occurs too
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
127
The APRIORI algorithm
• APRIORI (Agrawal et al.): efficient
discovery of frequent itemsets and
association rules
• Typical example: market basket analysis
– which things are often bought together?
• Association rule:
– IF a1, …, an THEN an+1, … an+m
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
128
• Association rules should have at least some
minimal
– support : #{t|(a1…an+m)} / #{t|true}
• how many people buy all these things together?
– confidence : #{t|a1…an+m}/#{t|a1…an}
• how many people of those buying IF-things also buy
THEN-things?
– Minimal support and confidence may be low
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
129
• APRIORI tailored towards using large data
sets
– efficiency very important
– minimize data access
• Works in 2 steps:
– find frequent itemsets
– compute association rules from them
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
130
• Observation:
– if a1…an infrequent (below min. support)
– then a1…an+1 also infrequent
• adding a condition can only strengthen the
conjunction
• Hence:
– {a1,…,an} can only be frequent if each subset
of it is frequent
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
131
• Leads to levelwise algorithm:
– first compute frequent singletons
– then frequent pairs, triples, …
– a lot of pruning possible due to previous
observation
• itemset of cardinality n is candidate if each subset of
it of cardinality n-1 was frequent in previous level
• need to count only candidates
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
132
Example
bread
butter
wine
cheese
ham
jam
Bread & butter Bread & cheese Bread & jam Butter & cheese Butter & jam Cheese & jam
Bread & butter & cheese
Bread & butter & jam
Not a candidate
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
133
Apriori algorithm
Min_freq := min_support*freq()
d := 0;
Q0= {}; /* candidates for level 0 */
F := ; /* frequent sets */
while Qd   do
for all S in Qd do find freq(S);
Fd := {S in Qd | freq(S)  min_freq};
F := F  Fd
compute Qd+1;
d := d+1
return F;
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
134
Computing candidates
Compute Qd+1 from Fd :
Qd+1 := ;
for each S in Fd do
for each item x not in S do
S’ := S  {x};
if i in S’: S’\{i}  Fd
then add S’ to Qd+1
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
135
• Step 2: deriving association rules from
frequent sets
– if S  {a}  F and #(S{a})/#S >
min_confidence
– then S -> S  {a} is a valid association rule
• = has sufficient support and confidence
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
136
Warmr
• Warmr is first-order version of Apriori
• Patterns (“itemsets”) are now conjunctive
queries
• “Frequent” patterns: what to count?
– examples, of course...
• Was easy in propositional case
– 1 example = 1 tuple -> count tuples
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
137
• In first-order case:
– also easy when learning from interpretations
– not so clear when learning from implications
• which implications are examples?
• indicate this by specifying a key
– key = unique identification of example
– each pattern contains a set of variables that forms
the key
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
138
• Example:
– assume 100 people in database
• person(X): X is the key
• count answer substitutions of X, not Y or Z!
– [person(X),] mother(X,Y): 40 examples
– mother(X,Y), has_pet(Y,Z) : 30 examples
– “mother(X,Y) ---> has_pet(Y,Z)” : support 0.3,
confidence 0.75
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
139
• Remark: association rule is NOT a clause
– mother(X,Y) ---> has_pet(Y,Z)
– = X: (Y:mother(X,Y)) ->
(YZ:mother(X,Y),has_pet(Y,Z))
–  mother(X,Y) -> has_pet(Y,Z)
• main difference is occurrence of
existentially quantified variables in
conclusion
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
140
• Illustrated on Bongard drawings:
– 1 example = 1 drawing
• contains(D,Obj): D is the key
– Pattern: e.g.,
• contains(D,X), circle(X), in(X,Y), circle(Y)
– Association rule: e.g.,
• contains(D,X), circle(X),in(X,Y),circle(Y) -->
contains(D,Z), square(Z)
• "drawings that contain a circle inside another circle
usually also contain
a square"
5/19/2016
ILP made easy -- ESSLLI 2000,
141
Birmingham
• Warmr also useful for feature construction
– Generally applicable method for improving
representation of examples
– Given description of example
• derive new (propositional) features that describe the
example
• add those features to a propositional description of the
example
• run a propositional learner
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
142
• For Bongard example:
– construct features "contains a circle", "contains a
circle inside a triangle", ...
– given the correct features, a propositional
representation of examples is possible
• Feature construction with ILP = general
method for applying propositional machine
learning techniques to structural examples
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
143
Decision tree induction in ILP
• S-CART (Kramer 1996): upgrade of CART
• Tilde (Blockeel & De Raedt ’98) upgrades C4.5
• Both induce "first order" or "structural"
decision trees (FOLDTs)
– test in node = first order literal
• may result in true or false -> binary trees
– different nodes may share variables
5/19/2016
• "real" test in a node = conjunction of all literal in path
from root to node
ILP made easy -- ESSLLI 2000,
Birmingham
144
Top-down Induction of
Decision Trees: Algorithm
function TDIDT(E: set of examples):
T := set of possible tests;
t := BEST_SPLIT(T, E);
E := partition induced on E by t
if STOP_CRIT(E, E) then return leaf(INFO(E))
else
for all Ei in E : ti := TDIDT(Ei)
return inode(t, {(i, ti)})
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
145
• Set of possible tests:
– generated using refinement operator
• c = conjunction on path from root to node
• (c ) - c = literal(s) to be put in node
• Other auxiliary functions < prop. TDIDT
– best split: using e.g. information gain
– stop_crit: e.g. significance test
– info: e.g. most frequent class
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
146
• Known from propositional learning:
– induction of decision trees is fast
– usually yields good results
• These properties are inherited by Tilde / SCART
• New results (not inherited from prop.
learning) on expressiveness
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
147
Example FOLDT
worn(X)
yes
no
irreplaceable(X)
yes
no
sendback
fix
ok
(x:  worn(x))
=> ok
(x: worn(x)  irreplaceable(x))
=> sendback
(xy: worn(x)  (worn(y)  irreplaceable(y))) => fix
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
148
Expressiveness
FOL formula equivalent with tree:
(x: worn(x))
=> ok
(x: worn(x)  irreplaceable(x))
=> sendback
(xy: worn(x)  (worn(y)  irreplaceable(y))) => fix
Logic program equivalent with tree:
a  worn(X)
b  worn(X), irreplaceable(X)
ok   a
sendback  b
fix  a   b
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
149
• Prolog program equivalent with tree, using
cuts (“first order decision list”):
sendback :- worn(X), irreplaceable(X), !
fix :- worn(X), !.
ok.
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
150
• FOLDT can be converted to
– layered logic program
• containing invented predicates
– “flat” Prolog program (using cuts)
• Can not be converted to flat logic program
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
151
Expressiveness
F
T=L
F = Flat logic programs
T = decision Trees
L = decision Lists
• Difference is specific for first-order case
• Possible remedies for ILP systems:
– invent auxiliary predicates
– use both  and 
– induce “decision lists”
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
152
Representation with keys
class(e1,fix).
worn(e1,gear).
worn(e1,chain).
class(e2,sendback).
worn(e2,engine).
worn(e2,chain).
class(e3,sendback).
worn(e3,control_unit).
class(e4,fix).
worn(e4,chain).
class(e5,keep).
worn(E,X)?
not_replaceable(X)?
class(E,sendback)
class(E,keep)
class(E,fix)
conversion to Prolog
replaceable(gear).
class(E,sendback) :replaceable(chain).
worn(E,X), not_replaceable(X), !.
not_replaceable(engine).
class(E,fix) :- worn(E,X), !.
not_replaceable(control_unit).
5/19/2016
ILP made easy --class(E,
ESSLLI 2000,
153
keep).
Birmingham
speed(x,s), s > 120, not job(x, politician),
not (y: knows(x,y), job(y,politician))
=> fine(x,Y)
speed(X,S), S>120
yes
no
N
job(X, politician)
yes
no
N
knows(X, Y)
yes
no
job(Y, politician)
5/19/2016
yes
no
N
Y
ILP made easy -- ESSLLI 2000,
Birmingham
Y
154
Other advantages of FOLDTs
• Both classification and regression possible
– classification : predict class (= learn concept)
– regression: predict numbers
• important: not given much attention in ILP
• Also clustering to some extent
– clustering: group similar examples together
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
155
Many other approaches and
applications of ILP possible...
• Combination of ILP and Q-learning
– RRL ("relational reinforcement learning"):
reinforcement learning in structural domains
• First-order equivalent of Bayesian networks
• First-order clustering
– needs first order distance measures
• ...
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
156
Conclusions
• Many different approaches exist in Machine
Learning
• ILP is in a sense diverging
– from concept learning…
– … to other approaches and tasks
• Still many new approaches to be tried!
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
157
Applications of ILP
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
158
Applications: Overview
•
•
•
•
•
•
•
User modelling
Games
Ecology
Drug design
Natural language
Inductive Database Design
…
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
159
User Modelling
• Behavioural cloning
– build model of user’s behaviour
– simulate user’s behaviour by means of model
– e.g. :
• learning to fly / drive / …
• learning to play music
• learning to play games (adventure, strategic, …)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
160
• Automatic adaptation of system to user
–
–
–
–
detect patterns in user’s actions
use patterns to try to predict user’s next action
based on predictions, make life easier for user
e.g.
•
•
•
•
5/19/2016
mail system (auto-priority, …)
adaptive web pages
intelligent search engines
…
ILP made easy -- ESSLLI 2000,
Birmingham
161
Example Applications
• Some applications the Leuven group has
looked at:
– behavioural cloning:
• learning to play music
• learning to play games
– automatic adaptation of system to user
• adaptive webpages
• a learning command shell
• intelligent e-mail interface
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
162
Learning to Play Music
• Van Baelen & De Raedt, ILP-96
• Playing music is difficult:
– not just playing the notes
– but: play with “feeling”
• adapt volume, speed, …
• Midi files provided to learning system
• System detects patterns w.r.t. pitch, volume,
speed, …
5/19/2016
ILP made easy -- ESSLLI 2000,
163
• … and tries to playBirmingham
music itself
• Why an ILP approach?
– mainly because of time sequences
• Results?
– Compare computer generated MIDI file with
human generated MIDI file
– “Computer makes similar mistakes as
beginning player”
• See ILP-96 proc. for details (LNAI 1314)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
164
Adaptive Webpages
• “Adaplix” project (Jacobs et al., 1997-)
• Webpage observes actions of user…
– e.g., which links are followed frequently, time
that is spent on one page,
• … and adapts itself
– within limitations given by page author
– change layout of page
– move links to different places
5/19/2016
made easy -- ESSLLI 2000,
– add or removeILPlinks
Birmingham
165
• example site:
http://adaplix.linux.student.kuleuven.ac.be
• identify yourself
– name, gender, occupation (personnel/student)
• based on this info: provides customized web
page
• student project (in Dutch)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
166
Intelligent Mailer
• “Visual Elm” (Jacobs, 1996)
• Intelligent mail interface:
– tries to detect which kind of mails are
•
•
•
•
•
immediately deleted
immediately read
not deleted, read later
forwarded
…
– based on this, assigns priorities to new mails
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
167
• Predictions:
– priority assigned to new mails
– expected actions: delete, forward, …
• Explanation facility
• Several options offered to user
– e.g.: set priority threshold, only show mails
above threshold
– sort mails according to priority
5/19/2016
ILP made easy -- ESSLLI 2000,
–…
Birmingham
168
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
169
Learning Shell
• Jacobs, Dehaspe et al. (1999)
• Context: Unix command shell, e.g., csh
• Each user has “profile” file
– defines configuration for user that makes it
easier to use the shell
– usually default profile, unless user changes it
manually
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
170
• Possible to learn profile file?
– Observe user
• which commands are often used?
• which parameters are used with the commands?
– Automatically construct better profile from
observations
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
171
• Example of input to ILP system :
/* background */
command(Id, Command) :isa(OrigCommand, Command),
command(Id, OrigCommand).
isa(emacs, editor).
isa(vi, editor).
/* observations */
command(1, ‘cd’).
attribute(1, 1, ‘tex’).
command(2, ‘emacs’).
switch(2, 1, ‘-nw’).
switch(2, 2, ‘-q’).
5/19/2016
ILP made easy -- ESSLLI 2000,
attribute(2, 1, ‘aaai.tex’).
Birmingham
172
• Detect relationships (“assocation rules”)
with ILP system Warmr
• Examples of rules output by Warmr :
IF command(Id, ‘ls’)
THEN switch(Id, ‘-l’).
IF recentcommand(Id, ‘cd’) AND command(ID, ‘ls’)
THEN nextcommand(Id, ‘editor’).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
173
• Some (preliminary) experimental results
• Evaluation criterion: predict next action of
user
• Actions logged for 10 users
– each log about 500 commands
• 2 experiments:
– learning from all log files together
– learning from individual log files
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
174
• Learning from mixed data:
– predictive accuracy 35% (= fmax, relative
frequency of most popular command)
• Learning from individual data:
– predictive accuracy 50% (> fmax)
• Conclusion:
– proposed approach to user modelling in this
context shows promise
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
175
Learning to Play Games
• Strategic games, adventure games, …:
– learning a strategy to play
• Examples:
– Rogue, …
– Slay
– Chess, Go, … : detecting patterns
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
176
Strategic & Adventure Games
• E.g. adventure games: Rogue, Wumpus, …
– 2-D world
– background knowledge
• Strategic game: “Slay”
– www.spto.demon.co.uk
• “Risk”-like game
– conquer territory of enemy
– larger territories are stronger
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
177
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
178
• Very complex to model
– 1 game situation = description of territories,
game pieces, …
– description of user’s actions = set of moves
• recruiting new soldiers / building new watch towers
• move pieces around within or outside territory
– even during 1 ply, situation changes all the time
• order of moves is important (some moves only
become possible after other moves)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
179
• Advantages of ILP in this context
– full description of all territories
– background knowledge easily incorporated
• e.g. rules of the game, definition of neighbouring
areas, …
– logic representation allows for interesting
reasoning mechanisms (e.g. event calculus)
• Unfinished work…
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
180
Board Games
• Chess, Go, …: learning to recognise
important patterns
• E.g., Go: Nakade forms, life/death problems
alive
Nakade
dead
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
181
• Some recent work in Go:
– high accuracy predicting “vital point” in
Nakade forms
– relatively high accuracy predicting good moves
to attack/defend groups of stones
• reduction of branching factor in search
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
182
Ecology
• Environmental applications:
– relatively new field, gaining importance
– much to be learned
– much interest in data mining
• Some applications:
– Biodegradability
– Water quality
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
183
Biodegradability
• Given some compound, predict whether it
will degrade quickly in water
– = building predictive models
• regression approach : predict half life time
• classification approach : predict resistant/degradable
– ILP makes predictions based on molecular
structure
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
184
Water quality
• Quality of river water is monitored
– samples taken regularly at different sites
– study organisms, chemicals, … in water
– see how polluted the water is
• Time-related information
– yesterday’s chemicals influence today’s
organisms
– ILP techniques used for predictive modelling
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
185
• Some approaches (with Tilde):
– predict water quality from chemical
measurements taken during some interval
– predict chemical properties from biological
measurements
• one at a time (different model for each chemical)
• all at once (regression with 16 target variables)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
186
Drug design & related
applications
• See examples in introduction:
– Mutagenesis
– Pharmacophore discovery
• Other examples:
– Carcinogenesis (PTE challenge, IJCAI-97)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
187
• Carcinogenesis application:
– many chemicals to be tested, to determine
whether they are carcinogenic
– tests are expensive and take long
– aim of PTE challenge:
• predict which compounds are (very likely to be)
carcinogenic / safe
• may speed up testing process
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
188
Natural Language Applications
• Statistical approaches: typically use limited
context
• ILP : potential to use unbounded context
• Applications: Part-of-speech tagging, NP
chunking, grammar / morphology learning,
…
• See Muggleton & Cussens, ESSLLI-2000
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
189
Inductive Database Design
• Given a deductive database
– find patterns in the database
• dependencies, constraints, …
– use these to restructure the database
• avoiding redundancy, increasing robustness
– hopefully arriving at a good design (possibly
better than human-designed?)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
190
Finding intensional definitions
• For each predicate p:
– learn a set of clauses that forms a sound and
complete definition of p
•
•
•
•
•
5/19/2016
algorithm based on Claudien
learning one clause at a time
recursive clauses possible
intensional coverage test (slow)
aim at maximising compactness (i.e. replacing large
extensional definitions with small intensional ones)
ILP made easy -- ESSLLI 2000,
Birmingham
191
• For instance: define path/2 from predicates
arc/2 and path/2
– first run:
• path(X,Y) :- arc(X,Z), path(Z,Y) found; valid, but
does not yield compaction
• path(X,Y) :- arc(X,Y) also valid, greatest
compaction -> added to Hp
– second run:
5/19/2016
• path(X,Y) :- arc(X,Z), path(Z,Y) found; valid, and
completes definition
of Hp 2000,
ILP made easy -- ESSLLI
192
Birmingham
Combining definitions
• If intensional definitions found for some
predicates, replace extensional definition by
intensional one
• Pitfall: definitions may be incompatible
DB
p(1).
p(2).
p(3).
5/19/2016
FID output
q(1).
q(2).
q(3)
Def for p: p(X):- q(X).
Def for q: q(X) :- p(X).
Both together: circular definition!
ILP made easy -- ESSLLI 2000,
Birmingham
193
• Case 1: no intensional definition found for p
– easy: has to be extensional
– call set of these predicates E
• Case 2: intensional definition found for p…
– depending only on p or predicates in E
• call set of such p I1
– depending only on p or predicates in I1E
• call set of such p I2
–…
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
194
• Case 3: other predicates
– these definitions may cause trouble (circular
definitions) -> study them more closely
• Searching for such predicates:
– using graph algorithm
– find strongly connected component (SCC) of at
least 2 elements in dependency graph
• SCC = “loops” in graph (path exists from each
element in SCC to each other element in SCC)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
195
Example dependency graph
p
r
t
q
s
v
5/19/2016
u
Which predicates in E, I1, I2, …?
Which form an SCC?
ILP made easy -- ESSLLI 2000,
Birmingham
196
Removing incompatibilities
• Definitions in an SCC :
– are always sound
– but may be incomplete
– “breaking” the SCC (by defining at least one
predicate extensionally) may make the
definitions complete
• chose predicate with small extensional definition
and large intensional one
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
197
IsIdd
• Above techniques (and some others)
implemented in the IsIdd system
(Interactive System for Inductive Database
Design)
• Illustrative example: “family database”
– facts on family relationships: parent,
grandparent, aunt/uncle, nephew/niece, …
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
198
grandparent(X,Y) :- parent(X,Z), parent(Z,Y).
sibling(X,Y) :- parent(Z,X), parent(Z,Y), noteq(X,Y).
pil(X,Y) :- parent(X,Z), married(Y,Z).
gil(X,Y) :- grandparent(X,Z), married(Y,Z).
sil(X,Y) :- sibling(X,Z), married(Y,Z).
sil(X,Y) :- sil(Y,X).
sil(X,Y) :- pil(Z,X), pil(Z,Y), noteq(X,Y).
aou(X,Y) :- sibling(X,Z), parent(Z,Y).
noc(X,Y) :- aou(Z,X), parent(Z,Y).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
199
Constraints found
• Constraints found using Claudien
• Constraints increase robustness of database
false :- parent(A,B), parent(B,A).
false :- married_asymm(A,B), married_asymm(B,C).
false :- parent(A,B), parent(A,C), parent(B,C).
false :- parent(A,B), parent(A,C), married_asymm(B,C).
false :- parent(A,B), parent(B,C), parent(C,A).
parent(A,B) :- parent(C,B), married_asymm(C,A).
parent(A,B) :- parent(C,B), married_asymm(A,C).
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
200
Beyond ILP
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
201
ILP for data mining
• Data mining = major application domain for
ILP
• Current ILP systems
– require knowledge of and experience with
Prolog, logic, …
– are not easy to use
– -> can only be used by highly trained people
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
202
• Current data mining systems
– require much less background in informatics
– are easier to use
• How to make ILP easier to use?
– Option 1: embed it into system (cf. IsIdd)
– Option 2: friendlier interface, e.g. more RDBoriented interface
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
203
Relational data mining
• ILP can be set in relational database context
– Replace predicates with relations
– Replace hypothesis language (no logic)
– Simplify input for ILP systems
• Difference with other systems in this
context:
– find patterns that extend over multiple tuples /
tables
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
204
Example: Registration Database
PARTICIPANT Table
NAME
adams
blake
king
miller
scott
turner
JOB
researcher
president
manager
manager
researcher
researcher
SUBSCRIPTION Table
COMPANY
scuf
jvt
ucro
jvt
scuf
ucro
PARTY R_NUMBER
no
23
yes
5
no
78
yes
14
yes
94
no
81
COMPANY Table
COURSE Table
COMPANY
TYPE
jvt
commercial
scuf
university
ucro
university
COURSE LENGTH
TYPE
cso
2
introductory
erm
3
introductory
so2
4
introductory
srw
3
advanced
NAME
adams
adams
adams
blake
blake
king
king
king
king
miller
scott
scott
turner
turner
COURSE
erm
so2
srw
cso
erm
cso
erm
so2
srw
so2
erm
srw
so2
srw
• Relations can easily be transformed into
predicates
• What about
– representation of hypothesis
– specification of types, modes, …
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
206
Hypothesis languages
• Users may not be familiar with logic
• Most are familiar with relational databases
– SQL: rather unreadable
– relational calculus: comparable with Prolog
• Ideal case: natural language
– translation Prolog -> English is feasible
– possibly expert-assisted
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
207
Simplifying the inputs
• Many settings are optional
– good defaults available
– non-experienced users can just ignore them
• Not so for language specifications!
– complicated part of input specifications
– cannot be avoided (currently)
• Need for simple formalism for language
specification
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
208
Use of UML
• UML can be used to describe database
– foreign key relationships, types, …
• Use UML as bias specification language
– Advantages:
• well known in a very broad community
• graphical input specification possible
• database may already have a description in UML ->
no extra work needed
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
209
References
• http://www.cs.bris.ac.uk/~ILPnet2/
• http://www.mlnet.org/
• Special issues of journals:
–
–
–
–
–
5/19/2016
New Generation Computing 95
Machine Learning 97
Journal of Logic Programming 99
Data Mining and Knowledge Discovery 99
…
ILP made easy -- ESSLLI 2000,
Birmingham
210
• Books:
–
–
–
–
Muggleton, ed., ILP Academic Press, 92.
Lavrac and Dzeroski 94
Nienhuys-Cheng and De Wolf 96
De Raedt, ed., Advances in ILP, IOS Press 96
• Proc. of ILP workshops/conferences: from 1996
onwards available as Lecture Notes in Artificial
Intelligence (Springer)
5/19/2016
ILP made easy -- ESSLLI 2000,
Birmingham
211