  Other available formats: [.PDF] [.TEX]

# VC Dimension: Examples and Tools

## Recall the Definitions

Let A be a class of subsets of Rd. The shatter coefficients of A are denoted by S (n,A ) and defined as the maximum number of subsets of a set of n elements that appear in the elemts of A . We say that the class A shatters a given set of n elements if each of its 2n subsets are picked by the members of A . The Vapnik-Chervonenkis (VC) dimension of A is the size of the largest set that can be shattered by A . It is denoted by V=V(A ). Thus, if V < ź we must have S (n,A )  =2n for each n Ż V and S (V+1,A ) < 2V+1. In words: If the VC dimension is V the class A  shatters some sets with V elements but it shatters no set with V+1 (or more) elements. Here are the simplests examples.

### Simplest Examples

Half lines in R
A  = {(-ź,x] : x R} . Clearly, S (n,A )  = n+1 and so V=1. Notice, that,
 S (n,A )  = n+1 = µĶ n 0 ÷° + µĶ n V ÷°
Intervals in R
A  = {(a,b): a,b R} . To compute S (n,A ) just think of the (n+1) spaces defined by n different real numbers. Any choice of two of these spaces corresponds to an (a,b) picking up the points inside. Thus,

 S (n,A )  = µĶ n+1 2 ÷° + 1
where the extra interval is the one picking none of the n points. Therefore, V=2 and again,

 S (n,A )  = n(n+1) 2 +1 = µĶ n 0 ÷° + µĶ n 1 ÷° + µĶ n V ÷°
Convex Polygons in R2
It is easy to see that for this class V=ź. Just think of the n points lying on the unit circle. Every subset of these n points is picked by the convex polygon having the elements of the subset as vertices.
To be able to build new examples easily, we show some general properties of shatter coefficients.
Let A and B be two classes of subsets of Rd and let * be a set operation. We define,

 A (*)B  = { A*B : A ╬ A , B ╬ B }
We have,
1. S (n,A (c)) = S (n,A ) .
2. S (n,A (Ū)B ) Ż S (n,A ) S (n,B ).
3. S (n,A ()B ) Ż S (n,A ) S (n,B ).
4. S (m+n,A ) Ż S (m,A )S (n,A ).
5. S (n,A (×)B ) Ż S (n,A ) S (n,B ).
Proof: The first one is trivial. Let's show the second. Consider a fix set {x1,,xn} of n points. The class A picks the subsets A1,A2,,Ak with k Ż S (n,A ) .Now if Aj has nj elements then B picks at most S (nj,B ) subsets of Aj. Thus,

 S (n,A (Ū)B ) Ż S (n,A ) Õ j=1 S (nj,B )
the result follows by noticing that nj Ż n. Three follows from one and two. To show four procceed as above by choosing a set of (m+n) points with m of the points marked and defining A1,,Ak now with k = S (m+n,A ) and split each Aj as,

 Aj = Aj1╚Aj2
with Aj1 containing all of the marked points in Aj. The total number of sets Aj1 is at most S (m,A ) and the total number of different sets of type Aj2 is at most S (n,A ) . Thus, k Ż S (m,A )S (n,A ) . The proof for five is similar to the second case.Ę

### More Examples

It is not difficult to generalize the previous examples to Rd.
South-West Intervals in Rd
A  = {(-ź,a1]×(-ź,a2]×(-ź,ad] : aj R } . By what we found for the case d=1 and property 5 above, we can deduce that, S (n,A )  Ż (n+1)d and looking ahead we guess V = d. This can be shown by noticing that A shatters the d cannonical vectors E={e1,,ed} with ej=(0,,0,1,0,,0) (i.e. all zeroes except a 1 in the jth entry). Any B E is picked by a member of A , e.g., by the one with aj = 1+ej/2 where ej = +1 if ej B and ej = -1 otherwise. Furthermore, no set with d+1 points can be shatter by A since it is imposible to pick only the d points where the first has the largest first coordinate, the second has the largest second coordinate, , the dth has the largest dth coordinate, and not the other. Thus, V=d. Ę
All the rectangles in Rd
For this class V = 2d. Again the case d=1 and property 5 above shows that,

 S (n,A )  Ż µĶ n(n+1) 2 +1 ÷° d < (n+1)2d
and looking ahead we'd guess V = 2d. A proof along the lines of the previous example is straight forward.

## Power Tools

The following simple result is known as Sauer's Lemma. It provides a way to bound shatter coefficients in terms of the VC dimension.
Sauer's Lemma
If V(A ) < ź, for all n
 S (n,A )  Ż VÕ i=0 µĶ n i ÷°
Proof: By induction on n. It is true for n Ż V since for these n,

 S (n,A )  = 2n = nÕ i=0 µĶ n i ÷° = VÕ i=0 µĶ n i ÷° .
Now let us assume the result true for n and deduce it for n+1. Clearly S (n+1,A ) is at most 2S (n,A )  (see property 4 above) but this bound can be reduced by using the fact that the vc dimension is V so we know that A shatters no set with V+1 points. On the other hand, a set with n+1 points has n choose V more subsets with V+1 elements than a set with n points has, and each of these must hide at least one new subset that cannot be picked out by A . Hence, applying the induction hypothesis, and the formula for Pascal's triangle satisfied by the binomial coefficients, we obtain,

 S (n+1,A )
 Ż
 2 S (n,A ) - µĶ n V ÷°
 =
 2 VÕ i=0 µĶ n i ÷° - µĶ n V ÷°
 Ż
 VÕ i=0 µĶ n i ÷° + VÕ i=1 µĶ n i-1 ÷°
 =
 VÕ i=0 µĶ n+1 i ÷° Ę
The promised bounds follow.
Corollary
If V < ź, for all n,

 S (n,A )  Ż (n+1)V,
and when n V,

 S (n,A )  Ż µĶ ne V ÷° V .
Proof: We have,

 VÕ i=0 µĶ n i ÷° Ż VÕ i=0 ni i! Ż VÕ i=0 niV! i!(V-i)! = (n+1)V
Also, if V/n Ż 1,

 µĶ V n ÷° V VÕ i=0 µĶ n i ÷°
 Ż
 VÕ i=0 µĶ V n ÷° i µĶ n i ÷°
 Ż
 nÕ i=0 µĶ V n ÷° i µĶ n i ÷°
 =
 µĶ 1 + V n ÷° n Ż eV. Ę
The other useful tool is,
Theorem
If G is an m-dimensional vector space of real-valued functions defined on Rd then the class of sets,
 A  = { {x : g(x) │ 0}: g ╬ G }
has VC dimension V Ż m.
Proof: Take a set of m+1 arbitrary points {x1,,xm+1} in Rd. We show one subset that can never be picked out by A . To define this special subset consider the linear transformation L from G to Rm+1 defined by,

 L(g) = (g(x1),╝,g(xm+1)).
Since the image space LG  is of dimension at most m, there must exist a non-zero vector g Rm+1 orthognal to the image space LG . Thus, there exist scalars g1,,gm+1 not all zero such that,

 m+1Õ i=1 gig(xi) = 0   for each  g ╬ G .
Separate the sum above according to whether gi is negative (i {-}) or not (i {+}), to obtain,

 Õ {+} gi g(xi) = Õ {-} (-gi) g(xi).
By replacing g by -g if necessary, we may assume (wlog since g 0) that {-} is non-empty and this shows that it is imposible for A to only pick out the xi with i {+}. For if there was a g G  with {x:g(x) 0} performing the trick, it would have to make the lhs of the above equation 0 but the rhs < 0 since {-} is non-empty and g(xi) < 0 for all i {-}.Ę

File translated from TEX by TTH, version 3.63.
On 5 Oct 2004, 10:13.

http://omega.albany.edu:8008/