﻿<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<?xml-stylesheet href="xbl-shape-bindings.css" type="text/css"?>

<html xmlns="http://www.w3.org/1999/xhtml"
	xmlns:mml="http://www.w3.org/1998/Math/MathML"
	xmlns:svg="http://www.w3.org/2000/svg" 
	xmlns:xlink="http://www.w3.org/1999/xlink"
	xmlns:xul="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
>

<head>
  <title>Numerical Linear Algebra in the Streaming Model: Upper Bounds</title>
<!-- metadata -->
  <meta name="generator" content="S5" />
  <meta name="version" content="S5 1.1" />
  <meta name="presdate" content="20050128" />
  <meta name="author" content="Ken Clarkson &bull;" />
  <meta name="company" content="IBM Almaden" />
<!-- configuration parameters -->
  <meta name="defaultView" content="slideshow" />
  <meta name="controlVis" content="hidden" />
<!-- style sheet links -->
  <link rel="stylesheet" href="ui/default/slides.css" type="text/css"
 media="projection" id="slideProj" />
  <link rel="stylesheet" href="ui/default/outline.css" type="text/css"
 media="screen" id="outlineStyle" />
  <link rel="stylesheet" href="ui/default/print.css" type="text/css"
 media="print" id="slidePrint" />
  <link rel="stylesheet" href="ui/default/opera.css" type="text/css"
 media="projection" id="operaFix" />
<!-- embedded styles -->
  <style type="text/css" media="all">
.imgcon {width: 525px; margin: 0 auto; padding: 0; text-align: center;}
#anim {width: 270px; height: 320px; position: relative; margin-top: 0.5em;}
#anim img {position: absolute; top: 42px; left: 24px;}
img#me01 {top: 0; left: 0;}
img#me02 {left: 23px;}
img#me04 {top: 44px;}
img#me05 {top: 43px;left: 36px;}
  </style>
  <style type="text/css" media="all">
     .demo {display: block; padding: 0.5em 0.5em 0.5em; margin: 0 1.5em 0.5em; font-size: 90%;}
    .floatright {float : right;}
  </style>
  
      <style>
      [class~="circle"] 
      {
        stroke: red;
        stroke-width: 2;
        fill: red;
        fill-opacity: 0.1;
      }
     <style>
		[class~="circ_control"]:hover {stroke:black; stroke-width:2; fill-opacity:0.2;}
	</style>
    </style>

  <script src="ASCIIMathML.js" type="text/javascript" />
 <!--  <script src="impl.js" type="text/javascript" /> -->
 <!-- S5 JS -->
  <script src="ui/default/slides.js" type="text/javascript" />
  <script type="text/javascript">
	AMsymbols = AMsymbols.concat([
	{input:">>", tag:"mo", output:"\u226B", tex:"gg"},
	{input:"ll", tag:"mo", output:"\u226A", tex:"ll"},
	{input:"sgn",  tag:"mo", output:"sgn", tex:null, ttype:CONST},
	{input:"exp",  tag:"mo", output:"exp", tex:null, ttype:CONST},
	{input:"Prob",  tag:"mo", output:"Prob", tex:null, ttype:CONST},
	{input:"argmax",  tag:"mo", output:"argmax", tex:null, ttype:UNDEROVER},
	]);
  </script>
</head>
<body>
<div class="layout">
   <div id="controls">
    <form action="#" id="controlForm" onmouseover="showHide('s');" onmouseout="showHide('h');">
      <div id="navLinks" class="hideme">
        <a accesskey="t" id="toggle" href="javascript:toggle();">&#216;</a>
        <a accesskey="z" id="prev" href="javascript:go(-1);">&laquo;</a>
        <a accesskey="x" id="next" href="javascript:go(1);">&raquo;</a>
      <div id="navList" ><select id="jumplist" onchange="go('j');"></select></div>
      </div>
    </form>
  </div>
<div id="currentSlide"><!-- DO NOT EDIT --></div>
<div id="footer">
   <h1>Streaming Linear Algebra</h1>
   <h2>Ken Clarkson</h2>
</div>

</div>

<ol class="xoxo presentation">

  <li class="slide">
    <h1>Numerical Linear Algebra in the Streaming Model: Upper Bounds</h1>
	<br/>
    <h3>Ken Clarkson<br/>IBM Almaden<br/><br/><em>joint with David Woodruff</em></h3>
      <div width="400" style="position:absolute; bottom:0.75in; right:2in;">
		<canvas id="title_canvas" width="300" height="300"></canvas>
<!--	<applet code="jvLite.class" archive="../../../web/enets/javaView/jvLite.jar" name="JavaView"
				width="350" height="350" style="float:right;" hspace="10" vspace="10" codebase="./">
		<param name="Model" value="javaView/sphere.jvx"/>
		<param name="displayFile" value="javaView/sphere.jvd"/>
		<param name="autoRotate" value="Show"/> 
		<param name="background" value="255;255;255"/>
		<param name="Border" value="Hide"/>
		<param name="Antialias" value="Show"/>
		<param name="Depthcue" value="Hide"/>
      </applet>-->

     </div>
  </li>


  <li class="slide">
    <h1>The Problems</h1>
    Given `n times d` matrix `A`, `n times d'` matrix `B`, integer `k`, estimators for:
    <ul>
      <li>The matrix product `A^TB</li>
      <li>The matrix `X^**` minimizing `||AX-B||`</li>
      <ul>
	<li>A slightly generalized version of least-squares regression</li>
      </ul>
      <li>The matrix `A_k` of rank `k` minimizing `||A - A_k||`</li>
      <ul>
	<li>Rank `k` implies: matrix can be expressed as `CD^T` where `C` and `D` have `k` columns</li>
      </ul>
      <li>The rank of `A`</li>
     </ul>
  </li>
  
  <li class="slide">
    <h1>General Properties of Our Algorithms</h1>
    <ul>
      <li>Matrix norm here is always Frobenius: root of sum of squares</li>
      <li>Make one pass over the matrix entries, in any order</li>
      <li>Maintain compressed versions of matrices,<br/>
	  with `O(d+d')`, `O(d^2)`, `O(k(n+d))`, or `O(k^2)` entries</li>
      <ul><li>That is, `o(N)`, where `N=nd` or `N=nc`, where `c := d+d'`</li></ul>
      <li>Do `O(1)` work per entry in maintaining the sketches</li>
      <li>Compute output results using the sketches</li>
      <li>Have provable error bounds, with high probability</li>
      <li>For some cases, sketches cannot be smaller</li>
      <ul><li>When `A` and `B` have appropriate-sized integer entries</li></ul>
    </ul>
  </li>
  
  <li class="slide">
    <h1>Matrix Compression Methods</h1>
    
    In a line of similar efforts...
     <ul>
        <li>Elementwise sampling [AM01][AHK06]</li>
        <li>Sketching/Random Projection: maintain a small number of
	      random linear combinations of rows or columns [S06]</li>
        <li>Row/column sampling: pick small random subsets of the
	    rows, columns, or both [DK01][DKM04]</li>
	<ul>
	  <li>Sample probability based on Euclidean norm of row or column</li>
	  <ul>
	    <li>In general, needs two passes</li>
	    <li>Or even: probability based on norm of vector in SVD</li>
	</ul>
	  <li>Whole row or column samples are good "examples", and may preserve sparsity</li>
	</ul>
      </ul><br/>
     
     Here: sketching
  </li>
  
  <li class="slide">
    <h1>Outline</h1>
    
    <ul>
      <li>Matrix Product</li>
      <ul>
	<li>The algorithm</li>
	<li>The bounds, and relation to Johnson-Lindenstrauss</li>
	<li>Previous work</li>
	<li>Outline of analysis</li>
      </ul>
      <li>Regression</li>
      <ul><li>Outline of analysis</li></ul>
      <li>Low-rank approximation</li>
      <ul><li>Outline of analysis, using regression results</li></ul>
      <li>(Rank estimation omitted)</li>
      <ul><li>Uses matrices `C` and `D` each with `k` rows
	  so that the rank of `CAD^T` is likely at least `k` if the rank of `A` is</li></ul>
    </ul>
  </li>

  
  <li class="slide">
    <h1>Approximate Matrix Product</h1>
    <ul>
      <li>`A` and `B` have `n` rows, we want to estimate `A^TB`</li>
      <li>Let `S` be an `n times m` <em>sign</em> matrix</li>
      <ul>
	<li>A.K.A. <em>Rademacher</em> or <em>Bernoulli</em></li>
	<li>Each entry is `+1` or `-1` with probability `1//2`</li>
	<li>`m = O(1)`, to be specified</li>
	<li>Independent entries, for now</li>
      </ul>
      <li>Our estimate of `A^TB` is `A^TSS^TB`</li>
      <li>That is, sketches are `S^TA` and `S^TB`</li>
    </ul>
  </li>     
    
  <li class="slide">
    <h1>Streaming Matrix Updates and Pass Efficiency</h1>
    <ul>
      <li>Need only for `S^TA` and `S^TB` implies algorithm in the streaming setting</li>
      <li>Suppose the matrix entries are given as a sequence of updates to `A` or `B`</li>
      <li>An update specifies `i`, `j`, `v`, and `A` or `B`, so that `a_{ij} := a_{ij} + v`, or sim. for `B`</li>
      <ul><li>As in the <em>turnstile</em> streaming model</li></ul>
      <li>Even for `A` and `B` fixed in memory, 
            the fewer passes over the data, the better</li>
     </ul>
  </li>



  <li class="slide">
    <h1>Algorithm Bounds</h1>
    <ul>
      <li>As `A` and `B` stream by, maintain `S^TA` and `S^TB`</li>
      <ul><li>For update `i`,`j`, `v` for `A`, add `v [s_{ i : }]^T` to the `j`'th column of current `S^TA`</li></ul>
      <ul>
        <li>Time is `O(m)` per update, since `s_{i: }` has `m` entries</li>
        <li>Space is `O(mc)` for `S^TA` and `S^TB`</li>
        <ul><li>`O(m)` space for `S`, as `S` entries need only be `O(log(1//delta)`-wise
	    independent</li></ul>
      </ul>
      <span class="incremental">
      <li>When desired, compute `[A^TS ][S^TB]//m`</li>
      <ul>
        <li>Time for product of `d times m` with `m times d'` is `O(m dd') = O(m c^2)`, `c:=d+d'`</li>
      </ul>
      </span>
      <span class="incremental">
      <li>As a streaming algorithm:</li>
      <ul>
        <li>Maintaining `N=nc` values using `o(N)` time and `o(N)` space</li>
        <li>But: <em>compute time</em> is `O(mdd')`, not `o(N)`</li>
        <li>In strictest sense, not streaming, but takes only one pass</li>
      </ul>
      </span>
    </ul>
  </li>
    
    
  <li class="slide">
    <h1>Why this works: the sign matrix `S`</h1>
    <ul>       
      <li>Suppose `x` and `y` are independent Rademacher random values</li>
      <ul>
        <li>Each takes the values `+1` and `-1` with equal prob.</li>
        <li>Then `x^2 = y^2 = 1`, and `bb E[x] = bb E [x^{2p+1}] = bb E[xy] = 0`</li>
      </ul>
      <span class="incremental">
      <li>Suppose `x` is a sign vector</li>
      <ul>
        <li>Each entry of `x` is an independent Rademacher random value</li>
        <li>Then `bb E[x]=0` and the outer product `bb E[x x^T]=I`</li>
      </ul>
      </span><span class="incremental">
      <li>Suppose `S` is a sign matrix with `m` columns:</li>
      <ul>
        <li>Each entry of `S` is an independent Rademacher value</li>
        <li>`SS^T` is the sum of the  outer products of the `m` column vectors `s_{ : i}` of `S`</li>
        <li>`bb E[SS^T]//m = E[sum_i s_{ : i} s_{ : i}^T]//m = (mI)//m = I`</li>
      </ul>
      </span>
    </ul>
  </li>


  <li class="slide">
    <h1>Expected Error and a Tail Estimate</h1>
    <ul>
        <li>From `bb E[SS^T]//m =I` and linearity of expectation,
        <blockquote>`bb E[A^TSS^TB//m] = A^T bb E[SS^T] B//m = A^TB`</blockquote></li>
        <li>So in expectation, sketch product is a good estimate of the product</li>
        <span class="incremental">
        <li>This is true also with high probability</li>
        <li>That is, for `delta,epsilon>0`, there is `m = O(log(1//delta)epsilon^{-2})` so that
          <blockquote>`Prob { {:||Lambda||  > epsilon||A||  ||B||  :} } le delta`</blockquote></li>
        <ul>
          <li>Here `Lambda` is the error `A^TSS^TB//m - A^TB`</li>
          <li>...and again `||A|| := [ sum_{i,j} a_{ij}^2] ^{1//2}`</li>
        </ul>
      <li>This tail estimate seems to be new</li>
      <li>True also when entries of `S` are `O(log(1//delta))`-wise independent</li>
      </span>
    </ul>
</li>
  
  <li class="slide">
    <h1>Relation to Johnson-Lindenstrauss</h1>
    <ul>
      <li>For `B=A=b`, <br/> the `n`-vector `b` `->` the `m`-vector `S^Tb`</li>
      <li>The tail estimate says that w.h.p.,<br/>
      `|| b^TSS^Tb // m - b^Tb ||
	= | || S^Tb || ^2//m - {:|| b ||:} ^2 | le epsilon || b || ^2`</li>
      <li>That is, the length of `b` is approximately preserved by `hat b := S^Tb`</li>
      <li>This is (pretty much) the celebrated Johnson-Lindenstrauss Lemma</li>
      <ul>
	<li>(Use a sign matrix rather than the original random rotation)</li>
      </ul>
    </ul>
  </li>
  
  <li class="slide">
    <h1>JL `=>` Matrix Product Estimate</h1>
    
    <ul>
      <li>The JL Lemma itself implies a weaker form of the matrix product result</li>
      <span class="incremental">
      <li>For columns vectors `a` and `b`, if<br/>
	`hat a`, `hat b`, and `hat a + hat b` have about the same length as <br/>
	`a`, `b`, and `a+b`,</li>
      <li>Then `hat a cdot hat b approx a cdot b`, with error
	about `epsilon ||a||||b||`</li>
      </span>
      <span class="incremental">
      <li>Apply JL to all `a_{ :i}`, `b_{ :j}`, and `a_{ :i} + b_{ :j}`</li>
      <li>Total failure probability is `O(c^2\delta)`,
	  where again `c:=d+d'`</li>
      <li>For large enough `m = O(log c \ log(1//delta)//epsilon^2)`,
      we have that every dot product `(S^Ta_{ :i}) cdot S^Tb_{ :j}` is a good
	estimate of `a_{ :i} cdot b_{ :j}`</li>
      </span>
    </ul>
  </li>
  
  <li class="slide">
    <h1>JL and Matrix Product</h1>
    
    <ul>
      <li>So: JL implies error bound for every entry of `A^TSS^TB`</li>
      <ul>
	<li>Not just the Frobenius norm</li>
	<li>At the cost of a factor of `log c` in `m`</li>
      </ul>
      
     </ul>
  </li>
  
  <li class="slide">
    <h1>Related Work</h1>
    
  This JL-based algorithm is due to Sarlós [S06], who gave two algorithms for product:    
    <ul>
        <li class="incremental">In one pass, but with an additional `log c` factor, using JL</li>
	<span class="incremental">
        <li>In two passes, using a bound on `bb E[||Lambda ||^2]`</li>
        <ul>
          <li>But needing limited randomness: for each column, four-wise independence</li>
          <li>Here: `O(log(1//delta))`-wise independence among all entries of `S` is adequate</li>
        </ul>
        <li>The two pass algorithm is similar in resource bounds to earlier sampling-based
	    algorithms</li>
	</span>
	<span class="incremental">
	<li>Our proofs are descendants of [S06], which stand on [DKM*]</li>
	</span>
      </ul>
  </li>
  
  
  <li class="slide">
   <h1>Lower Bound on Space</h1>
   
   <ul>
     <li>Squeezing out the `log c` factor in the sketch size
	  is maybe not so interesting</li>
    <span class="incremental">
     <li>Except: space lower bound `Omega(c//epsilon^2)log(nc)` is required by
     any one-pass algorithm,
      for failure probability `delta le 1//4`, when entries are `O(log(nc))` bit integers</li>
     <ul><li>For large enough `n` and `c`</li></ul>
     <li>Defer lower bound discussion to "part two"</li>
     </span><span class="incremental">
     <li>Result here has:</li>
    <ul>
      <li>Fewest passes (one)</li>
      <li>Least space for one pass</li>
      <li>High probability bounds</li>
      <ul><li>Simpler than previous for high probability</li></ul>
      <li>Most general streaming model</li>
    </ul>
    </span>
    </ul>
   </li>
  


  
  
  <li class="slide">
    <h1>A Moment Bound Implies the Tail Estimate</h1>
    
    <ul>
      <li>The tail estimate, implying that the sketches are good w.h.p.,
          follows from a bound on the moments of the error</li>
      <li>For a random variable `Y`, let  `bb E_p[Y]` denote `[bb E[Y^p]]^{1//p}`</li>
      <span class="incremental">
      <li>For any `p`,<br/>
        `bb E_p[||Lambda||^2] le C p ||A||^2||B||^2 // m `,<br/>
        for a constant `C`, where (again) `Lambda := A^TSS^TB//m - A^TB`</li>
      <ul><li>Or, `bb E_p[||Lambda|| ] le C sqrt p ||A|| ||B|| // sqrt m `</li></ul>
      </span><span class="incremental">
      <li>The bound `O{: ( :} {: sqrt {: p :}  :}{: ):}` as `p ->infty`
	    implies that `||Lambda||` is subgaussian:<br/>
            the tail of its distribution is bounded by that of a Gaussian</li>
      <li>Or: apply the Markov inequality to `||Lambda||^p`, use `p approx log(1//delta)`</li>
      </span>
    </ul>
  </li>

  <li class="slide">
    <h1>The Moment Bound, Roughly</h1>
    
    To bound `bb E [||Lambda|| ^{2p}]`:
    <ul>
        <li>Multiply out its definition</li>
        <li>Apply linearity of expectation</li>
        <li>The resulting sum has the form
        <blockquote>`sum ["terms dependent on "`A`" and "`B`] quad bb E[s_{i_1j_1} s_{i_2 j_2}....]`</blockquote></li>
        <li>Since `bb E[s^k] =0` for Rademacher `s` and odd `k`<br/> many summands are zero</li>
        <li>Conditions on subscripts that imply `bb E[s_{i_1 j_1}...]` terms are nonzero, also imply conditions
            on the data-dependent parts, implying that the sum can be bounded</li>
      </ul>
  </li>
  
  
  <li class="slide">
    <h1>Regression</h1>
    
    <ul>
      <li>The problem again: `min_X || AX-B||^2`</li>

      <li>`X^**` minimizing this has `X^** = {:A^{: - :} :} B`,<br/>
	  where `A^-` is the <em>pseudo-inverse</em> of `A`</li>
      <span class="incremental">
      <li>The algorithm is:</li>
      <ul>
	<li>Maintain `S^TA` and `S^TB`</li>
	<li>Return `hat X` solving `min_X || S^T(AX-B)||`</li>
      </ul>
      </span><span class="incremental">
      <li>Main claim: if `A` has rank `k`,<br/>
	there is `m=O(k epsilon^{-1} log(1//delta))` so that
        with probability at least `1-delta` <br/>
	`||A hat X - B|| le (1 + epsilon) || AX^** - B||`</li>
      <ul><li>That is, relative error for `hat X` is small</li></ul>
      </span>
    </ul>
  </li>
  
  <li class="slide">
    <h1>Regression Analysis</h1>
    
    <ul>
      <li>Why should `hat X` be so good?</li>
      <li class="incremental">`S^T` approximately preserves norm of `S^T(AX-B)`, for fixed `X`</li>
      <li class="incremental">If this worked for all `X`, we're done</li>
      <li class="incremental">`S^T` must preserve norms even for `hat X`, chosen using `S`</li>
      <span class="incremental">
      <li>The main idea: to show that `hat X` is good,
	  reduce to showing that `||A {:( :} X^** - hat X {:):}||` is small</li>
	<ul>
	  <li>Using normal equations of exact problem</li>
	</ul>
      <li>Then, use rank `k le d` of `A`</li>
      </span>
    </ul>
  </li>
    
  <li class="slide">
    <h1>Regression Analysis, cont.</h1>
      
    <ul>
      <li>`A` has rank `k`, so `A = CD^T` for `C` and `D` with `k` columns</li>
      <li>All columns of `A{:( :}X^** - hat X{:):}` are in the
	  columnspace of `C` </li>
	<ul>
	  <li>`equiv` the `k`-dimensional space of linear combinations of the columns of `C`</li>
	  <li>has the form `Cy` for a vector `y`</li>
	</ul>
      <span class="incremental">
      <li>Fact (Subspace JL): for `m=O(k epsilon^{-1} log(1//delta))`,<br/>
	  `S^T` approximately preserves lengths of all vectors in a `k`-space</li>
      <li>...including columns of `A {:( :} X^** - hat X {:):}`</li>
      <li>So, `||S^T (A hat X - B )||` small `=> ||A hat X - B||` is small</li>
      </span>
    </ul>
  </li>


  <li class="slide">
    <h1>Best Low-Rank Approximation</h1>
    <ul>
      <li>For any matrix `A` and integer `k`,
	  there is a matrix `A_k` of rank `k` that is closest to `A` among all matrices of rank `k`</li>
      <li>Since rank of `A_k` is `k`, it is the product `CD^T` of two `k`-column matrices `C` and `D`</li>
      <ul><li>(`A_k` can be found from the SVD (singular value decomposition), where `C` and `D` are
	  orthogonal matrices `U` and `V Sigma`)</li></ul>
      <ul>
	<li>This is a good compression of `A`</li>
	<li>If entries of `A` are noisy measurements,
	  often the noise is "compressed out" in this way</li>
	<li>LSI, PCA, Eigen*, recommender systems, clustering,...</li>
      </ul>
    </ul>
  </li>


  <li class="slide">
    <h1>Best Low-Rank Approximation and `S^TA`</h1>
    <ul>
      <li>The sketch `S^TA` holds a lot of information about `A`</li>
      <li>In particular, there is a rank `k` matrix `hat A_k` in the rowspace of `A` nearly
	as close to `A` as the closest rank `k` matrix `A_k`</li>
	<ul>
	  <li>The rowspace of `S^TA` is the set of linear combinations of its rows</li>
	</ul>
      <li>That is, `||A - hat A_k|| le (1+epsilon)||A-A_k||`</li>
    </ul>
  </li>
  

  
  <li class="slide">
    <h1>Low-Rank Approximation : Using Regression</h1>
    <ul>
      <li>Why is there such an `hat A_k`?
	  Apply the regression results with `A->A_k, B -> A`</li>
      <li>The `hat X` miniminizing `||S^T{: ( :}A_kX - A{:):}||`
	has `||A_k hat X - A|| le (1 + epsilon) || A_k X^** - A||`</li>
      <li>But here `X^** = I`, and `hat X = (S^TA_k)^{: - :}S^TA`</li>
      <li>So, the matrix `A_k hat X = A_k(S^TA_k)^{: - :}S^TA`:</li>
      <ul>
	<li>Has rank `k`, since the rank of the product is the min of the ranks</li>
	<li>Is in the rowspace of `S^TA`</li>
	<li>Is within `1+epsilon` of the smallest distance of any rank `k` matrix</li>
      </ul>
    </ul>
  </li>
  
  
  
   <li class="slide">
    <h1>Best Low-Rank Approximation: <br/>Two Pass Algorithm</h1>
    <ul>
      <li>We can't use `A_k(S^TA_k)^{: - :}S^TA` without finding `A_k` first</li>
      <li>Instead:</li>
      <ul>
	<li>Maintain `S^TA`</li>
	<li>Project: find the closest matrix `hat A` to `A` in the rowspace of `S^TA`</li>
	<li>Approximate: find the best rank `k` approximation to `hat A`</li>
      </ul>
      <li>But, this does two passes over `A`</li>
    </ul>
  </li>
   
  <li class="slide">
    <h1>Nearly Best Nearly-Low-Rank Approximation</h1>
    
    <ul>
      <li>Suppose `R` is a `d times m` sign matrix (recall `A` is `n times d`)</li>
      <li>By regression results transposed, the columnspace of `AR` contains
	a nearly best rank-`k` approximation to `A`</li>
      <li>That is, `hat X` minimizing `|| AR X - A ||` has
	`|| AR hat X - A|| le (1+epsilon) ||A - A_k||`</li>
      <span class="incremental">
      <li>Apply regression results with `A->AR` and `B->A`,<br/>
	and `X'` minimizing `||S^T(ARX - A)||`</li>
      <li>We have `X' = (S^TAR)^{: - :}S^TA` has <br/>
	`|| AR X' - A|| le (1+epsilon)||AR hat X - A|| le (1+epsilon)^2||A - A_k||`</li>
      <ul><li>Since `AR` has rank `k epsilon^{-1}`,
	`S` must be `n times m'`, with `m'=k epsilon^{-2}`</li></ul>
      </span>
    </ul>
  </li>
  
  <li class="slide">
    <h1>Nearly Best Nearly-Low-Rank Algorithm</h1>
    <ul>
      <li>An algorithm: maintain `AR` and `S^TA`, return
	`ARX' = AR (S^TAR)^{: - :} S^TA`</li>
      <ul>
	<li>Rank is `k//epsilon`</li>
	<li>Distance to `A` is `(1+epsilon)||A - A_k||`</li>
      </ul>
      <li>This approximation to `A` is interesting in its own right</li>
      <ul>
	<li>No SVD required, only psuedo-inverse of a matrix of constant size</li>
      </ul>
      
    </ul>
  </li>
  
  <li class="slide">
    <h1>Nearly Best Low-Rank Approximation</h1>
    
    Still haven't found a good rank `k` matrix
    <ul>
      <li>To do this, we find the
	best rank-`k` approximation to <br/>
	`AR(S^TAR)^{: - :} S^TA` in the columnspace of `AR`</li>
      <li>Uses sketches `AR` and `S^TA`
	that are bigger than our lower bounds require, w.r.t. `epsilon`</li>
      <ul>
	<li>We have to make `m` and `m'` bigger to prove same approximation bound</li>
      </ul>
      <li>When `A` is given a column at a time, or a row at a time, we can do better</li>
    </ul>
  </li>  
    
    
  <li class="slide">
    <h1>Concluding Remarks</h1>
    
    <ul>
      <li>Space bounds are tight for product, regression</li>
      <ul><li>Faster update times?</li></ul>
      <li>Space bounds are not tight w.r.t. `epsilon` for low-rank approximation</li>
      <ul><li>Upper bounds are at fault, probably</li>
	<li>We have better upper bounds for restricted cases</li></ul>
      <li>The entry-wise `r`-norm of the error matrix `Lambda` can also be bounded</li>
      <ul><li>This implies a bound on `||Lambda||_{"max"}` in terms of `||A||_{1->2}` and `||B||_{1->2}`</li></ul>
      <li>Other projection matrices besides sign matrices?</li>
      <li>For what other problems is the full power of the JL transform not needed?</li>
    </ul>
</li>


<!--
   <li class="slide">
    <h1>1024 by 764 Screen Size</h1>
    <div style="position:absolute; top:0; left:0;">
       <svg:svg width="1020" height="764">
	<svg:defs>
	  <svg:linearGradient id="orange_red" x1="0%" y1="0%" x2="100%" y2="0%">
	  <svg:stop offset="0%" style="stop-color:rgb(250,250,255); stop-opacity:1"/>
	  <svg:stop offset="100%" style="stop-color:rgb(210,210,255); stop-opacity:1"/>
	  </svg:linearGradient>
	</svg:defs>
	  <svg:rect width="1020" height="764" y="0" x="0" style="fill:url(#orange_red); stroke:blue" />
       </svg:svg>
    </div>
 </li>


   <li class="slide">
    <h1>Outline</h1>
    <ul>
      <li>Approximate matrix product</li>
      <li>A matrix <em>sketch</em> giving a product estimator that with
	    high probability is accurate</li>
      <li>Relation of this to Johnson-Lindenstrauss</li>
      <li>Algorithmic consequence: a one-pass algorithm</li>
      <li>Outline of the proof of the tail estimate for the error</li>
      <li>A related algorithm for Low-Rank Approximation</li>
     </ul>
  </li>
-->

    
         
         
</ol>
</body>
</html>
