Jekyll2018-11-09T22:50:53+00:00http://blog.neocortex.ch/An BianContinuous submodularity&#58 Non-convex structure with guaranteed optimization2016-12-28T00:00:00+00:002016-12-28T00:00:00+00:00http://blog.neocortex.ch/Cont-Submodularity<p>Recently, we are working on optimizing a class of non-convex objectives with a celebrated and general
structure, called <em>continuous submodularity</em>. People know that submodularity is a classical structure in
combinatorial optimization, it turns out that continuous submodularity is also a common non-convex structure for many continuous
objectives, with strong guarantees for both minimization and maximization.</p>
<p>For two very recent papers in this area, one can refer to <a href="https://arxiv.org/abs/1511.00394">the one</a>
from Francis Bach for minimization, and
<a href="https://arxiv.org/pdf/1606.05615.pdf">the one</a> from us for maximization.
This post aims to: i) explain how to recognize submodular continuous-functions;
ii) summarize the current results on optimizing submodular continuous-functions; iii)
discuss open problems in this new area.</p>
<h2 id="generic-submodular-functions">Generic submodular functions</h2>
<p>To have a better understanding of the submodularity of both set-functions
and continuous-functions, let us first of all give a <em>generic</em> view on the submodular functions.</p>
<p>The domain of a “generic” submodular function $f: \cal X\rightarrow \mathbb R$ is the Cartesian product of subsets of $\mathbb{R}$: $\cal X = \prod_{i=1}^n \cal X_i$, each $\cal X_i$ is a compact subset of $\mathbb R$. It is clear that one can define a <em>lattice</em> over $\cal X$ by taking the “join” operation $\vee$ as
coordinate-wise maximum, and the “meet” operation $\wedge$ as coordinate-wise minimum, respectively.</p>
<p>By considering different realizations of $\cal X_i$, we can recover different submodular
functions:</p>
<ul>
<li>$\cal X_i = \{0, 1\}$: submodular set-function;</li>
<li>$\cal X_i = \{0, 1, …, k_i -1\}, k_i>2$, $k_i\in \mathbb Z$: submodular integer-lattice-function;</li>
<li>$\cal X_i = [a, b]$ is an interval: submodular continuous-function.</li>
</ul>
<p>The submodularity of all of them can be defined as:</p>
<blockquote>
<p><strong>Submodularity and submodular functions:</strong>
For all $(x,y)$ in the domain, it holds <script type="math/tex">f(x) + f(y) \geq f(x\vee y) + f(x\wedge y)</script>. This function $f$ is a submodular function.</p>
</blockquote>
<p>It is well-known that for set-functions, submodularity is equivalent to the diminishing
returns (<strong>DR</strong>) property. However, this does not hold when generalized to generic
functions defined over $\cal X$:</p>
<blockquote>
<p><strong>DR property & DR-submodular functions</strong>: Let $\chi_i$ be the $i^\text{th}$ characteristic vector.
$f$ satisfies the DR property if $\forall a\leq b\in \cal X$, for any coordinate $i$, $\forall k\in \mathbb{R}_+$ s.t. $k\chi_i+a$ and $k\chi_i+b$
are still in $\cal X$, it holds
<script type="math/tex">f(k\chi_i+a) - f(a) \geq f(k\chi_i+b) - f(b)</script>. <br />
This function $f$ is called a DR-submodular function.</p>
</blockquote>
<p>One immediate observation is that $\nabla f(a)\geq \nabla f(b)$ (if $f$ is differentiable), so the gradient of a differentiable DR-submodualr function is an <em>antitone</em> mapping.</p>
<p>Both submodular and DR-submodular functions are prevalent in real-world applications.
So far there are naturally <em>three questions</em>:</p>
<p>Q1. For generic functions defined over $\cal X$, submodularity $\neq$ DR, what is the connection between them?</p>
<p>Q2. For the submodularity of generic functions defined over $\cal X$, is there an equivalent diminishing-returns-style property to characterize it?</p>
<p>Q3. What we can say regarding optimizing submodular and DR-submodular continuous-functions?</p>
<p>These questions will be answered in the following.</p>
<h2 id="characterization-of-generic--submodular-functions">Characterization of generic submodular functions</h2>
<p>First of all, we give a positive answer to question Q2 by proposing the <em>weak DR</em> property:</p>
<blockquote>
<p><strong>weak DR:</strong> $f$ satisfies the weak DR property if $\forall a\leq b\in \cal X$, for any
coordinate $i\in \{i’| a_{i’} = b_{i’} \}$, $\forall k\in \mathbb{R}_+$ s.t. $k\chi_i+a$ and $k\chi_i+b$ are still in $\cal X$, it holds
<script type="math/tex">f(k\chi_i+a) - f(a) \geq f(k\chi_i+b) - f(b)</script>.</p>
</blockquote>
<p>and show that</p>
<blockquote>
<p><strong>Lemma</strong>: For a generic function $f$, weak DR $\Leftrightarrow$ submodularity.</p>
</blockquote>
<p>For question Q1, now it is clear that DR-submodular functions are a subclass of submodular functions.
Furthermore, it can be shown that,</p>
<blockquote>
<p><strong>Lemma</strong>: submodularity + coordinate-wise concavity $\Leftrightarrow$ DR.</p>
</blockquote>
<p><img src="/images/cont-submodularity/submodular.png" style="float:left;width:35%" />
The class of submodular continuous-functions contains a subset of both convex
and concave functions, see the left figure for an illustration. For detailed
examples, one can refer to the corresponding sections in the above the two papers.</p>
<p>The characterizations of submodular and DR-submodular continuous-functions can be
put in comparison with that of convex functions, which are summarized
in the following tables. These properties make it very easy to recognize the
submodularity of a continuous-function.</p>
<!-- ![Table 1](/images/cont-submodularity/table1.png) -->
<p><img src="/images/cont-submodularity/table1.png" style="size:120%" />
<img src="/images/cont-submodularity/table2.png" alt="Table 2" /></p>
<p>For question Q3, please see the following.</p>
<h2 id="what-we-can-say-about-optimizing-submodular-continuous-functions-so-far">What we can say about optimizing submodular continuous-functions so far?</h2>
<p>Here I just summarize the current results on minimizing and maximizing
submodular continuous-functions from the above two papers. It is noteworthy that there are plenty of open problems in this new area.</p>
<ul>
<li>
<p>Submodular continuous-functions over the “box” constraints can be minimized to arbitrary precision in polynomial time using the discretization + continuous extension
method in <a href="https://arxiv.org/abs/1511.00394">Bach 2015</a>.</p>
</li>
<li>
<p>Maximizing a monotone DR-submodular continuous-function over general down-closed convex
constraints is NP-hard. The submodular Frank-Wolfe algorithm gives $(1-1/e)$-approximation and sublinear “convergence” rate <a href="https://arxiv.org/pdf/1606.05615.pdf">Bian et al 2016</a>.</p>
</li>
<li>
<p>Maximizing a non-monotone submodular continuous-function over “box”
constraints is NP-hard. The generalized DoubleGreedy algorithm gives $1/3$-approximation <a href="https://arxiv.org/pdf/1606.05615.pdf">Bian et al 2016</a>.</p>
</li>
</ul>
<h2 id="open-problems">Open problems</h2>
<p>Continuous submodularity is a very general structure in the non-convex realm.
The characterizations, especially the second order
properties, give a very convenient way to recognize a submodular/DR-submodular
objective in real-world applications. So in terms of <em>new applications</em>, I think
there are much more non-convex objectives waiting to be discovered, like what happened
for the submodular set-functions.</p>
<p>In terms of <em>theory</em>, there are lots of interesting open problems. To name a
few:</p>
<ul>
<li>
<p>For the minimization, how to make the algorithm faster/scalable? How to properly utilize the gradient information?</p>
</li>
<li>
<p>What one can say about constrained minimization?</p>
</li>
<li>
<p>For maximization, the projected gradient method works good in the experiments,
is it possible to prove some approximation guarantees?</p>
</li>
<li>
<p>For maximizing a non-monotone submodular continuous-function over “box”
constraints, whether the worst-case guarantee or the hardness results can be
improved?</p>
</li>
</ul>
<hr />
<p>Hopefully you will find out that the non-convex problem you are working
on turns out to be a submodular/DR-submodular one!</p>Recently, we are working on optimizing a class of non-convex objectives with a celebrated and general structure, called continuous submodularity. People know that submodularity is a classical structure in combinatorial optimization, it turns out that continuous submodularity is also a common non-convex structure for many continuous objectives, with strong guarantees for both minimization and maximization.I start to use Jekyll!2016-12-18T00:00:00+00:002016-12-18T00:00:00+00:00http://blog.neocortex.ch/Hello-World<p>I decide to use the fantastic Jekyll.<br />
This website is based on an open source Jekyll theme called
<a href="https://github.com/barryclark/jekyll-now">jekyll-now</a>.</p>I decide to use the fantastic Jekyll. This website is based on an open source Jekyll theme called jekyll-now.