﻿<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<?xml-stylesheet href="xbl-shape-bindings.css" type="text/css"?>

<html xmlns="http://www.w3.org/1999/xhtml"
	xmlns:mml="http://www.w3.org/1998/Math/MathML"
	xmlns:svg="http://www.w3.org/2000/svg" 
	xmlns:xlink="http://www.w3.org/1999/xlink"
	xmlns:xul="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
>

<head>

  <title>Core-sets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm</title>
<!-- metadata -->
  <meta name="generator" content="S5" />
  <meta name="version" content="S5 1.1" />
  <meta name="presdate" content="20050128" />
  <meta name="author" content="Ken Clarkson &bull;" />
  <meta name="company" content="Bell Labs" />
<!-- configuration parameters -->
  <meta name="defaultView" content="slideshow" />
  <meta name="controlVis" content="hidden" />
<!-- style sheet links -->
  <link rel="stylesheet" href="ui/default/slides.css" type="text/css"
 media="projection" id="slideProj" />
  <link rel="stylesheet" href="ui/default/outline.css" type="text/css"
 media="screen" id="outlineStyle" />
  <link rel="stylesheet" href="ui/default/print.css" type="text/css"
 media="print" id="slidePrint" />
  <link rel="stylesheet" href="ui/default/opera.css" type="text/css"
 media="projection" id="operaFix" />
<!-- embedded styles -->
  <style type="text/css" media="all">
.imgcon {width: 525px; margin: 0 auto; padding: 0; text-align: center;}
#anim {width: 270px; height: 320px; position: relative; margin-top: 0.5em;}
#anim img {position: absolute; top: 42px; left: 24px;}
img#me01 {top: 0; left: 0;}
img#me02 {left: 23px;}
img#me04 {top: 44px;}
img#me05 {top: 43px;left: 36px;}
  </style>
  <style type="text/css" media="all">
     .demo {display: block; padding: 0.5em 0.5em 0.5em; margin: 0 1.5em 0.5em; font-size: 90%;}
    .floatright {float : right;}
  </style>
  
      <style>
      [class~="circle"] 
      {
        stroke: red;
        stroke-width: 2;
        fill: red;
        fill-opacity: 0.1;
      }
     <style>
		[class~="circ_control"]:hover {stroke:black; stroke-width:2; fill-opacity:0.2;}
	</style>
    </style>

  <script src="ASCIIMathML.js" type="text/javascript" />
   <script src="impl.js" type="text/javascript" />
 <!-- S5 JS -->
  <script src="ui/default/slides.js" type="text/javascript" />
  <script type="text/javascript">
	AMsymbols = AMsymbols.concat([
	{input:">>", tag:"mo", output:"\u226B", tex:"gg"},
	{input:"sgn",  tag:"mo", output:"sgn", tex:null, ttype:CONST},
	{input:"Prob",  tag:"mo", output:"Prob", tex:null, ttype:CONST},
	{input:"argmax",  tag:"mo", output:"argmax", tex:null, ttype:UNDEROVER},
	]);
  </script>
</head>
<body>
<div class="layout">
   <div id="controls">
    <form action="#" id="controlForm" onmouseover="showHide('s');" onmouseout="showHide('h');">
      <div id="navLinks" class="hideme">
        <a accesskey="t" id="toggle" href="javascript:toggle();">&#216;</a>
        <a accesskey="z" id="prev" href="javascript:go(-1);">&laquo;</a>
        <a accesskey="x" id="next" href="javascript:go(1);">&raquo;</a>
      <div id="navList" ><select id="jumplist" onchange="go('j');"></select></div>
      </div>
    </form>
  </div>
<div id="currentSlide"><!-- DO NOT EDIT --></div>
<div id="footer">
   <h1>Core-sets, SGA, Frank-Wolfe</h1>
   <h2>TOC talk</h2>
</div>

</div>

<ol class="xoxo presentation">

  <li class="slide">
    <h1>Coresets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm</h1>
	<br/>
    <h3>Ken Clarkson</h3>
  </li>
  
     
   <li class="slide">
    <h1>The problem</h1>
    <ul>
        <li>Find `x_**` maximizing `f(x)`, for `x in S`, where:</li>
            <ul>
                <li>`f(x)` is a concave function <img src="f/concave.png"/></li>
                <li>`S` is the simplex `{x | sum_i x_i = 1, x_i >= 0}` <img src="f/simplex.png"/></li>
                <ul>
                    <li>Vertices of `S` are points `e(i)`, where `e(i)` is the vector with `e(i)_i = 1`, others zero</li>
                    <li>Points in `S` are the ones that give you convex combinations</li>
                </ul>
            </ul>
    </ul>
   </li>
   
    <li class="slide">
        <h1>The approximation algorithm</h1>
        Compute a sequence `x_{:(0):}, x_{:(1):}, x_{:(2):},...` as follows
        <ul>
            <li>Let `x_{:(0):} := ` the vertex of `S` with largest `f` value</li>
            <li>For `k=0,1,2,ldots` :</li>
            <ul><div style="float:center"><img style="float:right" src="f/new_x.png"/></div>
                <li>Find index `i'` of largest coordinate of the gradient `grad f{:(x:}_{:(k):}{:):}`</li>
                <li>Let `x_{:(k+1):} := ` point on segment from `x_{:(k):}` to `e(i')` maximizing `f(x)`</li>
                <li>That is, find `alpha' in [0,1]` maximizing `f(x_{:(k):} + alpha(e(i') - x_{:(k):}))` </li>
            </ul>
            <br/>
            <center>    </center>
        </ul>
    </li>
    
    
     <li class="slide">
        <h1>What the algorithm is doing</h1>
        <ul>
            <li>At `x := x_{:(k):}`,</li><img src="f/taylor.png" style="float:right"/>
                <blockquote>`f(y) approx f(x) + (y-x)^T grad f(x)`</blockquote>
            <li>What is the maximum of that linear approximation?</li>
                <blockquote>`max_{y in S} y^T grad f(x) = max_i grad f(x)_i = grad f(x)_{i'}`</blockquote>
            <li>...with `y=e(i')`.  So <br/>
                `max_{y in S} f(x) + (y-x)^T grad f(x) = max_i grad f(x)_i + f(x) - x^T grad f(x)`</li>
            <li>The algorithm finds the vertex `e(i')` maximizing the linear approximation, and moves toward it.</li>
        </ul>
    </li>
   
   
   
   
    <li class="slide">
    <h1>Example: Minimum Enclosing Ball</h1>
    <ul>
        <li>The problem is,</li>
        <ul>
            <li>Given: a set `P={p_1,ldots,p_n}` of points</li>
            <li>Find: their minimum enclosing ball</li>
            <ul><li>Also known as the 1-center, smallest enclosing sphere...</li></ul>
        </ul>
   </ul><br/>
             <center>    <div style="float:center"><img style="float:center" src="f/meb.png"/></div></center>
     </li>
    
    <li class="slide">
        <h1>Algorithm, as applied to MEB</h1>
    <ul>
        <li>Let:</li>
        <ul>
            <li>`b in RR^n` have `b_i := p_i^2 := p_i^Tp_i`;</li>
            <li>`A` be a matrix whose columns are the `p_i`;</li>
        </ul>
        <li>Then:</li>
        <ul>
            <li>For `f(x) := x^Tb - (Ax)^2`, the MEB problem is dual to `max_{x in S} f(x)`</li>
            <li>That is, max problem gives a lower bound for MEB, and `c_** := Ax_**` is the center of the MEB</li>
 <!--           <ul><li>Looks for weighting `x` that favors large `p_i^Tp_i` and small weighted sum of `p_i`</li></ul> -->
        </ul>
        <li>For `i' := argmax_i grad f(x)_i` chosen by the algorithm,</li>
        <ul>
            <li>`grad f(x) = b - 2A^TAx`, so</li>
                <li>`grad f(x)_i = p_i^2 - 2p_i^TAx = p_i^2 - 2p_i^Tc = (p_i-c)^2 - c^2`</li>
        </ul>
        <li><b>So:</b> `p_{i'}` is the point of `P` farthest from `c=Ax`</li>
    </ul>
   </li>
 
    <li class="slide"><h1>Algorithm, as applied to MEB, cont.</h1>
    <ul>
        <li>Vector `x` prescribes a weighted combination of the input points</li>
        <li>Max gradient of `f iff`  farthest point from current center</li>
        <li>Algorithm first proposed for MEB only [BC03]</li>
    </ul>
	<div id="d_coresets1" class="demo" style=" height:8in; width:100%;">
	    <svg:svg viewBox="0 0 700 600" z-order="-1" style=" height:100%; width:100%;">
		    <svg:g width="600" height="42" >
			<svg:circle class="circ_control"  title="go" tooltip="foo" id="canvas_circ_go" r="20" cx="250" cy="51" style="fill:lightgreen;"/>
			<svg:circle class="circ_control" title="single step" id="canvas_circ_step" r="20" cx="300" cy="51" style="fill:yellow;"/>
			<svg:circle class="circ_control" title = "stop" id="canvas_circ_stop" r="20" cx="350" cy="51" style="fill:red;"/>
		    </svg:g>
		<svg:g id="canvas" width="700" height="500" y="100"/>
	    </svg:svg>
	</div>
    </li>

    


    <li class="slide">
        <h1>Why is this algorithm interesting?</h1>
        <ul>
            <li>Iterates are sparse: iterate `x_{:(k):}` has at most `k+1` nonzero coordinates</li>
            <li>Many applications</li>
            <li>Simple</li>
            <li>Iterates are provably good approximations</li>
                <ul><li>Additive error `O(1//k)`</li></ul>
            <li>Relation to coresets `approx` sparsity + approximation</li>
        </ul>
    </li>
    

    <li class="slide">
        <h1>Application areas</h1>
        <ul>
            <li>Given a set of points: find minimum enclosing ball, ellipsoid, axis-aligned ellipsoid;</li>
            <li>Support vector machines (SVM)</li>
            <ul><li>hard margin, `L_2`-SVM, `L_2`-SVR</li></ul>
            <li>"Boosting", for example, Adaboost</li>
            <ul><li>`approx` finding best convex combinations of classifiers</li></ul>
            <li>Convex approximation</li>
            <ul>
                <li>of a point by other points `f(x) := -||p-Ax||^2`</li>
                <li>of a function by other functions</li>
                <ul>
                    <li>density mixtures</li>
                </ul>
                <li>w.r.t. different norms</li>
                <ul>
                    <li>`L_p` distance instead of `L_2`</li>
                    <li>KL and other divergences [AS]</li>
                </ul>
            </ul>
        </ul>
    </li>
    
    <li class="slide">
        <h1>Hard margin SVM</h1>
        
        <ul>
            <li>Training problem is: given red and blue points, find the thickest slab that separates them</li>
            <li>Corresponding `f(x)` is a quadratic function</li>
            <li>Points likely to be in very high-dimensional "feature space"</li>
        </ul>
        <br/>
        <center><img src="f/svm.png"/></center>
    </li>
    


    <li class="slide">
        <h1>Simplicity</h1>
        <ul>
            <li>Convergence is slow: additive `epsilon` in `O(1//\epsilon)` steps</li>
            <ul>
                <li>Gradient ascent has "linear convergence": `O(log(1//epsilon))` steps</li>
                <li>Newton-type methods have "quadratic convergence": `O(log log (1//\epsilon))` steps</li>
            </ul>
            <li><em>However</em>, for Newton:</li>
            <ul>
                <li>Fewer steps, but much more work and space per step</li>
                <ul><li>Solve linear system each step</li></ul>
            </ul>
            <li>For gradient ascent, constant may be large</li>
            <li>Simple may be better for large-scale problems [TKS][H][Nemi]</li>
        </ul>
    </li>
      
    
     <li class="slide">
        <h1>"Good enough approximation"</h1>
        <ul><div style="valign:center"><img style="float:right" src="f/fit.png"/></div>
            <li>Sometimes answer need only be "good enough", e.g., estimation/learning</li>
            <li>Statistical estimators: fit parametric model to noisy data</li>
            <ul><li>e.g., fit a line to points</li></ul>
            <li>Noise in data `=>` exact fit to data not helpful</li>
            <li>Error = Data error + Opt. error</li>
            <ul>
                <li>No reason to reduce Opt. error to zero</li>
            </ul>
        </ul>
        <center>    </center>
   </li>
   
     <li class="slide">
        <h1>Why is this algorithm interesting?</h1>
        <ul>
            <li>Iterates are sparse: iterate `x_{:(k):}` has at most `k+1` nonzero coordinates</li>
            <li>Many applications</li>
            <li>Simple</li>
            <li style="color:red">Iterates are provably good approximations</li>
                <ul style="color:red"><li>Additive error `O(1//k)`</li></ul>
            <li style="color:lightgray">Relation to coresets `approx` sparsity + approximation</li>
        </ul>
    </li>
   
     <li class="slide">
        <h1>Aside: history</h1>
        <ul>
            <li>A variant is called "sparse greedy approximation" in machine learning [Z03]</li>
            <ul>
                <li>Tries every `i'`, not just the one realizing max `grad f(x)`;</li>
                <li>Boosting, density mixtures, divergences studied there</li>
                <li>Showed convergence results</li>
            </ul>
            <li>Variants for some special cases are called "coreset-based algorithms" in computational geometry [BHI02]</li>
            <ul>
                <li>Common variant works harder: finds opt `x` in given `k`-face</li>
                <li>Applied to MEB, other enclosure problems, SVM, shape approximation, `k`-centers...</li>
            </ul>
            <li>Algorithm is a special case of the Frank-Wolfe algorithm [FW56]</li>
            <ul>
                <li>General convergence results shown</li>
            </ul>
        </ul>
    </li>
  


   <li class="slide">
    <h1>Approximation properties</h1>
    <ul>
      <li>For many of the `f(x)`, there is a "nonlinearity measure" `C_f` so that</li>
      <blockquote>`f{:(x:}_{:(k):}{:):} >= f(x_**) - C_f/(k+3)`</blockquote>
      <li>`C_f` is related to norm of second derivative of `f`</li>
      <ul>
        <li>The flatter the graph of `f` is, the smaller `C_f` is</li>
        <li>When `f(x)` is linear, `C_{:f:} = 0` and optimum is very sparse: it's a vertex of `S`</li>
      </ul>
    </ul>
   </li>
   
  <li class="slide">
    <h1>The Wolfe Dual</h1>
    <ul>
        <li>Minimize, for `x in RR^n` and with `z(x) := max_i grad f(x)_i`,</li>
            <blockquote>
                `w(x) := z(x) + f(x) - x^T grad f(x)`
            </blockquote>
        <ul>
            <li>(Use Lagrangian relaxation, use KKT to get `lambda = ze - grad f(x)`, substitute for `lambda`)</li>
            <li>Recall `w(x)` is optimum within `S` of linear approx. at `x`</li>
        </ul>
        <li>For `x in S`, `w(x) >= w(x_**) = f(x_**) >= f(x)`</li>
        <li>Gap `w(x) - f(x) = z(x) - x^T grad f(x) = (e(i') - x)^T grad f(x)`</li>
        <ul><li>Coordinate `i'` as used in algorithm</li></ul>
    </ul>
   </li>
      
  <li class="slide">
    <h1>Example dual: MEB</h1>
    <ul>
        <li>Used <blockquote>`f(x):= x^Tb - x^TA^TAx = x^Tb - c^2`</blockquote> for MEB, where `c=Ax`</li>
        <li>So</li>
                <table><tr><td>`z(x) + f(x) - x^T grad f(x)`</td><td align="left"> `= z(x) + (x^Tb - c^2) - x^T(b-2A^Tc)</td></tr><tr><td></td><td  align="left">` = z(x) + c^2`</td></tr></table>
        <li>Recalling `z(x) = max_i grad f(x)_i = (p_{i'}-c)^2 - c^2`</li>
        <li>So `z(x) + c^2` is the max squared distance of `c` to any `p_i`</li>
    </ul>
   </li>
  
  <li class="slide">
    <h1>One step improvement</h1>
    <ul>
        <li>For `y := x + alpha(e(i') - x)`, Taylor's theorem says:
        <br/>`qquad f(y) approx f(x) + alpha(e(i')-x)^T grad f(x)`<br/> `qquad qquad qquad + alpha^2(e(i')-x)^T grad^2 f(x) (e(i')-x)`</li><img src="f/taylor.png" style="float:right"/>
        <li>By concavity of `f`, first two terms bound `f(x)` from above</li>
        <li>Choose `C_f` to be smallest value<br/> so that `f(y) >= f(x)+ alpha(e(i')-x)^T grad f(x) - alpha^2 C_f`</li    >
        <li>But: `(e(i')-x)^T grad f(x) = w(x) - f(x)`</li>
        <li>So: `f(y) >= f(x) + alpha(w(x) - f(x)) - alpha^2 C_f`</li>
    </ul>
   </li>

  
  <li class="slide">
    <h1>One step improvement, cont.</h1>
    <ul>
        <li>Have: `f(y) >= f(x) + alpha(w(x) - f(x)) - alpha^2 C_f`</li>
        <li>So if `w(x) - f(x) gg 0`, then `f(y) gg f(x)`</li>
        <li> Let `h(x) := (f(x_**)-f(x))//4C_f`, `g(x) := (w(x) - f(x))//4C_f`</li>
        <ul><li>Since `w(x) >= f(x_**) >= f(x)`, have `g(x) >= h(x)`</li></ul>
        <li>Restating, have: `h(y) le h(x) - alpha g(x) + alpha^2//4`</li>
        <li>Choosing best `alpha` gives `h(y) le h(x) - g(x)^2 le h(x) - h(x)^2`</li>
        <li>Have `h{:( :}x_{:(k+1):}{:):} leq h{:( :}x_{:(k):}{:):} - h{:( :}x_{:(k):}{:):}^2 => h{:( :}x_{:(k):}{:):} le 1//(k+3)`</li>    
    </ul>
   </li>
  
  <li class="slide">
    <h1>What is `C_f`?</h1>
    <ul>
        <li>`C_{:f:} le` sup of `-{:1/2:}(y-x)^T grad^2 f({:bar x:})(y-x)`, over `{:bar x:}, x,y in S`</li>
        <li>Main case: `f(x)` has the form `{:hat f:}(Ax)`</li>
        <ul>
            <li>That is, optimizing `hat f` within the polytope `AS`</li>
            <li>`grad^2 f(x) = A^T grad^2{:hat f:}(Ax)A`</li>
            <li>`C_{:f:} le` sup of `-{:1/2:}(y-x)^T A^T grad^2 {:hat f:}(A{:bar x:})A(y-x)`</li>
            <li>Quadratic `f` has this form, and quadratic term of `{:hat f:}` is `-c^2`</li>
            <ul><li>`C_{:f:} le` sup of `(Ay - Ax)^2`, or `diam(AS)^2`</li></ul>
        </ul>
    </ul>
  </li>
  
 <li class="slide">
    <h1>Example `C_f`: MEB and SVM</h1>
    <ul>
        <li>For MEB, `C_{:f:} le 4R^2 = 4f(x_**)`, where `R` is the MEB radius</li>
        <li>So additive error `epsilon C_f` becomes relative error `4epsilon`</li>
        <li>Same bound on `C_f` for any quadratic problem</li>
        <ul>
            <li>For SVM, relative error depends on `rho_**^2//R^2`, where `rho_**` is opt. thickness</li>
            <li>The VC dimension of set of regions that are radius-`R` balls with thickness-`rho` central slice removed is  `R^2//rho_**^2`</li>
        </ul>
    </ul>
   </li>
  

 <li class="slide">
    <h1>Variations</h1>
    <ul>
        <li>Have: `h(y) le h(x) - alpha g(x) + alpha^2//4`</li>
        <li>In particular, `h(y) le h(x) - g(x)^2</li>
        <li>Lazy variant:</li>
        <ul>
            <li>Don't find best `alpha'`, use `alpha_k := 2//(k+3)`</li>
            <li>Get same guarantees</li>
        </ul>
        <li>Primal/dual approximation:</li>
        <ul>
            <li>If `g{:( :}x_{:(k):}{:):}` is always big, eventually `h(x) lt 0`</li>
            <li>Implies, eventually see `g{:( :}x_{:(k):}{:):} lt epsilon` <strong>and</strong> `h{:( :}x_{:(k):}{:):} lt epsilon`</li>
            <ul><li>After `1//epsilon` steps, `h() le epsilon`; after another `1//epsilon`, some `g() le epsilon`</li></ul>
            <li>For MEB: `x` proves that there is a small ball, and that no ball can be much smaller</li>
        </ul>
      </ul>
   </li>
 
 <li class="slide">
    <h1>Primal/dual and coresets</h1>
    <ul>
        <li>So: some `hat x` is good in primal, dual, and has `O(1//epsilon)` nonzero entries</li>
        <li>Nonzero coordinates `N subset {1 ldots n}`, and `|N| = O(1//epsilon)`</li>
        <li>`N` is <em>almost</em> a combinatorial specification of a good approx. solution</li>
        <ul>
            <li>For MEB, `N harr` a subset of `P` that specifies a good approximate solution</li>
            <li>But: need values of `hat x` at those coordinates</li>
            <li>Use canonical `x` for `N`: the best `x` with those nonzeros `=:x_N`</li>
        </ul>
        <li>Cannot simply run algorithm, and use `N` for resulting `x`: `f(x_N) >= f(x)`, but no grip on `w(x_N)`</li>
    </ul>
 </li>
 
<li class="slide">
    <h1>Coresets</h1>
    <ul>
        <li>Another variation:</li>
        <ul>
            <li>Pick `i'` the same way, but make `x_{:(k+1):}` maximize `f(x)` over all `x` with same nonzeros</li>
            <li>That is: `N := N cup {i'}`, and find `x_{N}`</li>
            <li>The measure `h()` decreases as fast, and as before, eventually `g() lt epsilon`</li>
        </ul>
        <li>So the algorithm is:</li>
        <ul>
            <li>Let `N :=` index of the vertex of `S` with largest `f` value</li>
            <li>While `g(x_N) le epsilon`:</li>
            <ul>
                <li>Find index `i'` of largest coordinate of the gradient `grad f(x_N)`</li>
                <li>Let `N := N cup {i'}`</li>
            </ul>
        </ul>
    </ul>
</li>
 
    <li class="slide">
        <h1>Why are coresets interesting?</h1>
        <ul>
            <li>An approach for <em>densest-ball</em> problem:</li><img src="f/densest.png" style="float:right"/>
            <ul>
                <li>Given points `P`, find `P' subset P` with `|P'| > n//2` and `P'` in smallest ball over all such subsets</li>
                <li>Approximation algorithm:</li>
                <ul><li>Try all subsets of size `O(1//epsilon)`, one of them is coreset for MEB of `P'`</li></ul>
            </ul>
            <li>Implications for sample complexity of hard-margin SVM</li>
            <li>Imply algorithms for:</li>
            <ul>
                <li>`k`-center, `k`-median, cylinder-fitting, `k`-means</li>
            </ul>
            <li>Running times depend on `epsilon`, but often, independent of dimension</li>
            <ul>
                <li>Important for SVM, in particular</li>
            </ul>
       </ul>
   </li>
  
  
  <li class="slide">
    <h1>Coreset variants</h1>
    <ul>
         <li>"Away" steps: work even harder, for a smaller core-set</li>
         <ul>
            <li>After getting `x_{:N:}` with `|N|=K+1` nonzeros, at each iteration, also:</li>
			<ul>
			  <li>Pick the coordinate `i''` at which `x_N` is smallest over all nonzero coords, and </li>
			  <li>Set `N:= N setminus {i''}`</li>
                        <li>Still make progress in `f`, for `K` a small multiple of `1//epsilon`, and gap `g` large</li>
			</ul>
			<li>Allows smaller (near-optimal) constant in the core-set size</li>
	     </ul>
         <li>Be lazier</li>
         <ul>
            <li>Use a fixed schedule of stepsize `alpha_k`, </li>
            <li>No change in provable bounds</li>
          </ul>
	 </ul>
   </li>
  
  <li class="slide">
    <h1>Probabilistic coreset existence</h1>
    <ul>
        <li>R.V. `Y`: pick `i` to be `1` with probability `x_i^**`</li>
        <li>Pick `Y_j` for `j=1...K`, and `sum_j Y_j//K` is a good sparse approx. solution</li>
        <li>Additive error `sum_i x_i^** p_i^2//r`, for quadratic `{:hat f:}(Ax)`, columns of `A` are `p_i`</li>
        <li>This is no big improvement for MEB, but is in another case</li>
        <li>Method of cond. prob.?</li>
    </ul>
  </li>
   
  <li class="slide">
    <h1>Concluding remarks</h1>
    <ul>
        <li>A simple, and old, algorithm has many applications and implications, even today</li>
        <li>New things:</li>
        <ul>
            <li>General analysis for some approximation algorithms;</li>
            <li>General proof of coreset existence, including with "away" steps</li>
            <li>Faster algorithm for many learning applications</li>
            <li>Sharper bounds for SVM coreset constants (and constants matter)</li>
            <li>General clue about where sparsity comes from</li>
        </ul>
        <li>Further implications for sample complexity?</li>
        <li>Good relative error for MEB: `C_{:f:} approx` OPT; good relative bounds for other problems?</li>
        <li>Distant relative of algorithms for finding best linear approximation ([GMS,Tropp]); what implications here?</li>
    </ul>
   </li>




  <li class="slide">
    <h1>Bonus Animation: MEB Approximation</h1>
   <ul>
      <li>Move toward midpoint of two recent farthest points</li>
    </ul>
     <div id="d_coresets2" class="demo" style=" height:8in; width:100%;">
		<svg:svg viewBox="0 0 700 600" z-order="-1" style=" height:100%; width:100%;">
			<svg:g width="600" height="42" >
				<svg:circle class="circ_control"  title="go" tooltip="foo" id="canvas2_circ_go" r="20" cx="250" cy="51" style="fill:lightgreen;"/>
				<svg:circle class="circ_control" title="single step" id="canvas2_circ_step" r="20" cx="300" cy="51" style="fill:yellow;"/>
				<svg:circle class="circ_control" title = "stop" id="canvas2_circ_stop" r="20" cx="350" cy="51" style="fill:red;"/>
			</svg:g>
			<svg:g id="canvas2" width="700" height="500" y="100"></svg:g>
		</svg:svg>
		</div>
   </li>


</ol>
</body>
</html>
