
VC Dimension: Examples and Tools
Carlos C. Rodríguez
http://omega.albany.edu:8008/
Recall the Definitions
Let A be a class of subsets of R^{d}. The shatter coefficients of A are
denoted by S (n,A ) and defined as the maximum number of subsets of a set of
n elements that appear in the elemts of A . We say that the class A shatters
a given set of n elements if each of its 2^{n} subsets are picked by the
members of A . The VapnikChervonenkis (VC) dimension of A is the size of the
largest set that can be shattered by A . It is denoted by V=V(A ). Thus,
if V < ¥ we must have S (n,A )
=2^{n} for each n £ V and
S (V+1,A ) < 2^{V+1}. In words: If the VC dimension is V the class A
shatters some sets with V elements but it shatters no set with
V+1 (or more) elements. Here are the simplests examples.
Simplest Examples
 Half lines in R

A
= {(¥,x] : x Î R} . Clearly, S (n,A )
= n+1 and so V=1. Notice,
that,
S (n,A )
= n+1 = 
æ è

n
0

ö ø

+ 
æ è

n
V

ö ø



 Intervals in R

A
= {(a,b): a,b Î R} . To compute S (n,A ) just think of the (n+1) spaces
defined by n different real numbers. Any choice of two of these spaces corresponds
to an (a,b) picking up the points inside. Thus,
S (n,A )
= 
æ è

n+1
2

ö ø

+ 1 

where the extra interval is the one picking none of the n points. Therefore,
V=2 and again,
S (n,A )
= 
n(n+1)
2

+1 = 
æ è

n
0

ö ø

+ 
æ è

n
1

ö ø

+ 
æ è

n
V

ö ø



 Convex Polygons in R^{2}

It is easy to see that for this class V=¥. Just think of the n points
lying on the unit circle. Every subset of these n points is picked by the
convex polygon having the elements of the subset as vertices.
To be able to build new examples easily, we show some general properties of
shatter coefficients.
Let A and B be two classes of subsets of R^{d} and let * be a set
operation. We define,
A (*)B
= { A*B : A Î A , B Î B } 

We have,
 S (n,A ^{(c)}) = S (n,A ) .
 S (n,A (Ç)B ) £ S (n,A ) S (n,B ).
 S (n,A (È)B ) £ S (n,A ) S (n,B ).
 S (m+n,A ) £ S (m,A )S (n,A ).
 S (n,A (×)B ) £ S (n,A ) S (n,B ).
Proof: The first one is trivial. Let's show the second. Consider a fix
set {x_{1},¼,x_{n}} of n points. The class A picks the subsets
A_{1},A_{2},¼,A_{k} with k £ S (n,A ) .Now if A_{j} has n_{j}
elements then B picks at most S (n_{j},B ) subsets of A_{j}. Thus,
S (n,A (Ç)B ) £ 
S (n,A ) å
j=1

S (n_{j},B ) 

the result follows by noticing that n_{j} £ n. Three follows from one
and two. To show four procceed as above by choosing a set of (m+n) points
with m of the points marked and defining A_{1},¼,A_{k}
now with k = S (m+n,A ) and split each A_{j} as,
with A_{j1} containing all of the marked points in A_{j}. The total
number of sets A_{j1} is at most S (m,A ) and the total number
of different sets of type A_{j2} is at most S (n,A ) . Thus,
k £ S (m,A )S (n,A ) . The proof for five is similar to the second case.·
More Examples
It is not difficult to generalize the previous examples to R^{d}.
 SouthWest Intervals in R^{d}

A
= {(¥,a_{1}]×(¥,a_{2}]¼×(¥,a_{d}] : a_{j} Î R } . By what we found for the case d=1 and property 5 above,
we can deduce that,
S (n,A ) £ (n+1)^{d} and looking ahead we guess V = d. This can be shown
by noticing that A shatters the d cannonical vectors E={e_{1},¼,e_{d}}
with e_{j}=(0,¼,0,1,0,¼,0) (i.e. all zeroes except a 1 in the
jth entry). Any B Ì E is picked by a member of A , e.g., by the one
with a_{j} = 1+e_{j}/2 where e_{j} = +1 if e_{j} Î B and
e_{j} = 1 otherwise. Furthermore, no set with d+1 points can be
shatter by A since it is imposible to pick only the d points where the
first has the largest first coordinate, the second has the largest second
coordinate, ¼, the dth has the largest dth coordinate, and not the other.
Thus, V=d. ·
 All the rectangles in R^{d}

For this class V = 2d. Again the case d=1 and property 5 above shows that,
S (n,A ) £ 
æ è

n(n+1)
2

+1 
ö ø

d

< (n+1)^{2d} 

and looking ahead we'd guess V = 2d. A proof along the lines of the previous
example is straight forward.
Power Tools
The following simple result is known as Sauer's Lemma. It provides a way to
bound shatter coefficients in terms of the VC dimension.
 Sauer's Lemma

If V(A ) < ¥, for all n
S (n,A ) £ 
V å
i=0


æ è

n
i

ö ø



Proof:
By induction on n. It is true for n £ V since for these n,
S (n,A )
= 2^{n} = 
n å
i=0


æ è

n
i

ö ø

= 
V å
i=0


æ è

n
i

ö ø

. 

Now let us assume the result true for n and deduce it for n+1.
Clearly S (n+1,A ) is at most 2S (n,A ) (see property 4 above) but
this bound can be reduced by using the fact that the vc dimension is
V so we know that A shatters no set with V+1 points. On the other
hand, a set with n+1 points has n choose V more subsets with
V+1 elements than a set with n points has, and each of these must
hide at least one new subset that cannot be picked out by A . Hence,
applying the induction hypothesis, and the formula for Pascal's
triangle satisfied by the binomial coefficients, we obtain,
 

 
 

2 
V å
i=0


æ è

n
i

ö ø

 
æ è

n
V

ö ø


 
 


V å
i=0


æ è

n
i

ö ø

+ 
V å
i=1


æ è

n
i1

ö ø


 
 

 

The promised bounds follow.
 Corollary

If V < ¥, for all n,
and when n ³ V,
S (n,A ) £ 
æ è

ne
V

ö ø

V

. 

Proof:
We have,

V å
i=0


æ è

n
i

ö ø

£ 
V å
i=0


n^{i}
i!

£ 
V å
i=0


n^{i}V!
i!(Vi)!

= (n+1)^{V} 

Also, if V/n £ 1,


æ è

V
n

ö ø

V


V å
i=0


æ è

n
i

ö ø





V å
i=0


æ è

V
n

ö ø

i


æ è

n
i

ö ø


 
 


n å
i=0


æ è

V
n

ö ø

i


æ è

n
i

ö ø


 
 


æ è

1 + 
V
n

ö ø

n

£ e^{V}. · 
 

The other useful tool is,
 Theorem

If G is an mdimensional vector space of realvalued functions defined
on R^{d} then the class of sets,
A
= { {x : g(x) ³ 0}: g Î G } 

has VC dimension V £ m.
Proof:
Take a set of m+1 arbitrary points {x_{1},¼,x_{m+1}} in R^{d}. We show
one subset that can never be picked out by A . To define this special subset
consider the linear transformation L from G to R^{m+1} defined by,
L(g) = (g(x_{1}),¼,g(x_{m+1})). 

Since the image space LG is of dimension at most m, there must
exist a nonzero vector g Î R^{m+1} orthognal to the image
space LG . Thus, there exist scalars g_{1},¼,g_{m+1}
not all zero such that,

m+1 å
i=1

g_{i}g(x_{i}) = 0 for each g Î G . 

Separate the sum above according to whether g_{i}
is negative (i Î {}) or not (i Î {+}), to obtain,

å
{+}

g_{i} g(x_{i}) = 
å
{}

(g_{i}) g(x_{i}). 

By replacing g by g if necessary, we may assume (wlog since
g ¹ 0) that {} is nonempty
and this shows that it is imposible for A to only pick out the x_{i} with
i Î {+}. For if there was a g Î G with {x:g(x) ³ 0} performing the
trick, it would have to make the lhs of the above equation ³ 0
but the rhs < 0 since {} is nonempty and g(x_{i}) < 0 for all
i Î {}.·
File translated from
T_{E}X
by
T_{T}H,
version 3.63. On 5 Oct 2004, 10:13.
