There are many options how to do this:
- Use - Scalarwith- MPI_COMM_SELF. Example at the bottom.
 
- Subclass - Scalaror- GenericTensorclass so that it does not collect the value across ranks. Use- void assemble(GenericTensor& A, const Form& a);. To do the former, just take- Scalar.cppand strip off MPI communication.
 
- Use  - void assemble_[cells|exterior_facets|interior_facets](
         GenericTensor& A,
         const Form& a, UFC& ufc,
         std::shared_ptr<const MeshFunction<std::size_t> > domains,
         std::vector<double>* values);
 - Collect - valueson each rank separetely. Some extra memory will be needed.
 
- Define form using measure with which is local to each rank and assemble separately on each rank. This will not get rid of communication and will require some extra memory to store a cell/facet function. 
- Prepare local submeshes on MPI_COMM_SELF and assemble there. Lot of extra memory needed. 
EDIT Example implementation of 1: Functional.ufl
element = FiniteElement("Lagrange", triangle, 1)
f = Coefficient(element)
I = f*dx
forms = [I]
main.cpp
#include <dolfin.h>
#include "Functional.h"
using namespace dolfin;
class Source : public Expression
{
  void eval(Array<double>& values, const Array<double>& x) const
  {
    values[0] = MPI::rank(MPI_COMM_WORLD);
  }
};
int main()
{
  UnitSquareMesh mesh(32, 32);
  Source f;
  Functional::Form_I I(mesh, f);
  TensorLayout tl(MPI_COMM_SELF, {}, 0, 0, {}, false);
  Scalar local_value;
  local_value.init(tl);
  assemble(local_value, I);
  info("Rank %d, integral %f", MPI::rank(MPI_COMM_WORLD),
       local_value.get_scalar_value());
  return 0;
}