Tpetra parallel linear algebra  Version of the Day
Tpetra_DistributionLowerTriangularBlock.hpp
1 // @HEADER
2 // ***********************************************************************
3 //
4 // Tpetra: Templated Linear Algebra Services Package
5 // Copyright (2008) Sandia Corporation
6 //
7 // Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
8 // the U.S. Government retains certain rights in this software.
9 //
10 // Redistribution and use in source and binary forms, with or without
11 // modification, are permitted provided that the following conditions are
12 // met:
13 //
14 // 1. Redistributions of source code must retain the above copyright
15 // notice, this list of conditions and the following disclaimer.
16 //
17 // 2. Redistributions in binary form must reproduce the above copyright
18 // notice, this list of conditions and the following disclaimer in the
19 // documentation and/or other materials provided with the distribution.
20 //
21 // 3. Neither the name of the Corporation nor the names of the
22 // contributors may be used to endorse or promote products derived from
23 // this software without specific prior written permission.
24 //
25 // THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
26 // EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27 // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28 // PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
29 // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
30 // EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
31 // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
32 // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
33 // LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
34 // NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
35 // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
36 //
37 // Questions? Contact Michael A. Heroux (maherou@sandia.gov)
38 //
39 // ************************************************************************
40 // @HEADER
41 
42 #ifndef __TPETRA_DISTRIBUTORLOWERTRIANGULARBLOCK_HPP
43 #define __TPETRA_DISTRIBUTORLOWERTRIANGULARBLOCK_HPP
44 
45 // Needed by DistributionLowerTriangularBlock
46 #include "Tpetra_Distributor.hpp"
47 
48 // Needed by LowerTriangularBlock operator
49 #include "Tpetra_Core.hpp"
50 #include "Tpetra_Map.hpp"
51 #include "Tpetra_Operator.hpp"
52 #include "Tpetra_Vector.hpp"
53 
54 namespace Tpetra
55 {
56 
58 template <typename gno_t, typename scalar_t>
59 class DistributionLowerTriangularBlock : public Distribution<gno_t,scalar_t> {
60 // Seher Acer's lower-triangular block decomposition for triangle counting
61 // See also: LowerTriangularBlockOperator below that allows this distribution
62 // to be used in Tpetra SpMV.
63 //
64 // Requirements:
65 // Matrix must be square (undirected graph)
66 // Number of processors np = q(q+1)/2 for some q.
67 //
68 // Only the lower triangular portion of the matrix is stored.
69 // Processors are arranged logically as follows:
70 // 0
71 // 1 2
72 // 3 4 5
73 // ...
74 //
75 // The lower triangular part of the matrix is stored in a 2D distribution.
76 // For example, the dense 7x7 lower triangular matrix below would be assigned
77 // to processors according to numbers shown as follows:
78 // 0 | |
79 // 00| |
80 // ---------
81 // 11|2 |
82 // 11|22 |
83 // 11|222|
84 // ---------
85 // 33|444|5
86 // 33|444|55
87 // ...
88 // (Note that we expect the matrix to be sparse. For dense matrices,
89 // CrsMatrix is the wrong tool.)
90 //
91 // Matrix rows are assigned to processor rows greedily to roughly balance
92 // (# nonzeros in processor row / # processors in processor row)
93 // across processor rows.
94 // The same cuts are used to divide rows and columns among processors
95 // (that is, all processors have a square block).
96 //
97 // The lower triangular algorithm:
98 // 1. distribute all matrix entries via 1D linear distribution
99 // (this initial distribution is needed to avoid storing the entire
100 // matrix on one processor, while providing info about the nonzeros per row
101 // needed in step 2.
102 // 2. (optional) sort rows in decreasing order wrt the number of nonzeros
103 // per row
104 // 3. find "chunk cuts": divisions in row assignments such that
105 // (# nonzeros in processor row / # processors in processor row) is
106 // roughly equal for all processor rows
107 // 4. send nonzeros to their new processor assignment
108 //
109 // Known issues: (TODO)
110 // - The sorting in Step 2 and computation of chunk cuts in step 3 are
111 // currently done in serial and requires O(number of rows) storage each
112 // processor. More effort could parallelize this computation, but parallel
113 // load balancing algorithms are more appropriate in Zoltan2 than Tpetra.
114 // - The sorting in Step 2 renumbers the rows (assigns new Global Ordinals to
115 // the rows) to make them contiguous, as needed in Acer's triangle counting
116 // algorithm.
117 // (Acer's algorithm relies on local indexing from the chunk boundaries to
118 // find neighbors needed for communication.)
119 // The class currently provides a permutation matrix P describing the
120 // reordering. Thus, the matrix stored in the lower triangular block
121 // distribution is actually P A P -- the row and column permutation of
122 // matrix A in the Matrix Market file.
123 // The fact that a permuted matrix is stored complicates use of the matrix
124 // in algorithms other than Acer's triangle counting. For SpMV with the
125 // vector numbered according to the MatrixMarket numbering, for example,
126 // P^T must be applied to the vector before SpMV, and P^T must be applied to
127 // the result of SpMV. See LowerTriangularBlockOperator to see how this
128 // permutation matrix is used.
129 //
130 // Before addressing these issues, we will decide (TODO)
131 // - Is this Distribution general enough to be in Tpetra?
132 // - Should we, instead, have a separate package for distributions (that could
133 // use Zoltan2 and Tpetra without circular dependence)?
134 // - Or should we allow users (such as the triangle counting algorithm) to
135 // provide their own distributions (e.g., LowerTriangularBlock) that
136 // inherit from Tpetra's Distribution class?
137 // For now, we will push this Distribution into Tpetra, but we will revisit
138 // this decision.
139 
140 public:
141  using Distribution<gno_t,scalar_t>::me;
142  using Distribution<gno_t,scalar_t>::np;
143  using Distribution<gno_t,scalar_t>::comm;
144  using Distribution<gno_t,scalar_t>::nrows;
145  using typename Distribution<gno_t,scalar_t>::NZindex_t;
146  using typename Distribution<gno_t,scalar_t>::LocalNZmap_t;
147 
148  using map_t = Tpetra::Map<>;
149  using matrix_t = Tpetra::CrsMatrix<scalar_t>;
150 
151  DistributionLowerTriangularBlock(size_t nrows_,
152  const Teuchos::RCP<const Teuchos::Comm<int> > &comm_,
153  const Teuchos::ParameterList &params) :
154  Distribution<gno_t,scalar_t>(nrows_, comm_, params),
155  initialDist(nrows_, comm_, params),
156  sortByDegree(false), permMatrix(Teuchos::null),
157  redistributed(false), chunksComputed(false), nChunks(0)
158  {
159  int npnp = 2 * np;
160  nChunks = int(std::sqrt(float(npnp)));
161  while (nChunks * (nChunks + 1) < npnp) nChunks++;
162 
163  TEUCHOS_TEST_FOR_EXCEPTION(nChunks * (nChunks+1) != npnp, std::logic_error,
164  "Number of processors np = " << np <<
165  " must satisfy np = q(q+1)/2 for some q" <<
166  " for LowerTriangularBlock distribution");
167  nChunksPerRow = double(nChunks) / double(nrows);
168 
169  const Teuchos::ParameterEntry *pe = params.getEntryPtr("sortByDegree");
170  if (pe != NULL) sortByDegree = pe->getValue<bool>(&sortByDegree);
171 
172  pe = params.getEntryPtr("readPerProcess");
173  if (pe != NULL) redistributed = pe->getValue<bool>(&redistributed);
174 
175  if (me == 0) std::cout << "\n LowerTriangularBlock Distribution: "
176  << "\n np = " << np
177  << "\n nChunks = " << nChunks
178  << std::endl;
179  }
180 
181  enum DistributionType DistType() { return LowerTriangularBlock; }
182 
183  bool areChunksComputed() {return chunksComputed; }
184 
185  Teuchos::Array<gno_t> getChunkCuts() {
186  if(chunksComputed)
187  return chunkCuts;
188  else {
189  throw std::runtime_error("Error: Requested chunk cuts have not been computed yet.");
190  }
191  }
192 
193  // Return whether this rank owns vector entry i.
194  // TODO: for now, use same vector dist as 1DLinear;
195  // TODO: think about best distribution of Vectors
196  inline bool VecMine(gno_t i) { return initialDist.VecMine(i); }
197 
198  // Return whether this rank owns nonzero (i,j)
199  // Vector map and row map are the same in 1D distribution.
200  // But keep only the lower Triangular entries
201  bool Mine(gno_t i, gno_t j) {
202  if (redistributed) {
203  if (j > i) return false; // Don't keep any upper triangular entries
204  else return (procFromChunks(i,j) == me);
205  }
206  else
207  return initialDist.Mine(i,j);
208  }
209 
210  inline bool Mine(gno_t i, gno_t j, int p) {return Mine(i,j);}
211 
212  // How to redistribute according to chunk-based row distribution
213  void Redistribute(LocalNZmap_t &localNZ)
214  {
215  // Going to do chunking and sorting serially for now;
216  // need to gather per-row information from each processor
217  // TODO: think about a parallel implementation
218 
219  gno_t myFirstRow = initialDist.getFirstRow(me);
220  gno_t nMyRows = initialDist.getNumRow(me);
221  Teuchos::Array<gno_t> nnzPerRow(nMyRows, 0);
222 
223  Teuchos::Array<int> rcvcnt(np);
224  Teuchos::Array<int> disp(np);
225  for (int sum = 0, p = 0; p < np; p++) {
226  int prows = initialDist.getNumRow(p);
227  rcvcnt[p] = prows;
228  disp[p] = sum;
229  sum += prows;
230  }
231 
232  // If desire sortByDegree, first need to sort with respect to ALL entries
233  // in matrix (not lower-triangular entries);
234  // decreasing sort by number of entries per row in global matrix.
235  // Generate permuteIndex for the sorted rows
236 
237  Teuchos::Array<gno_t> permuteIndex; // This is the inverse permutation
238  Teuchos::Array<gno_t> sortedOrder; // This is the original permutation
239 
240  Teuchos::Array<gno_t> globalRowBuf;
241  // TODO Dunno why there isn't a Teuchos::gatherAllv;
242  // TODO for now, compute and broadcast
243  if (me == 0) {
244  globalRowBuf.resize(nrows, 0); // TODO: Ick! Need parallel
245  }
246 
247  if (sortByDegree) {
248  // Compute nnzPerRow; distribution is currently 1D and includes all nz
249  for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
250  gno_t I = it->first.first;
251  nnzPerRow[I-myFirstRow]++;
252  }
253 
254  Teuchos::gatherv<int,gno_t>(nnzPerRow.getRawPtr(), nMyRows,
255  globalRowBuf.getRawPtr(),
256  rcvcnt.getRawPtr(), disp.getRawPtr(),
257  0, *comm);
258 
259  permuteIndex.resize(nrows); // TODO: Ick! Need parallel
260  sortedOrder.resize(nrows); // TODO: Ick! Need parallel
261 
262  if (me == 0) { // TODO: do on all procs once have allgatherv
263 
264  for (size_t i = 0 ; i != nrows; i++) sortedOrder[i] = i;
265 
266  std::sort(sortedOrder.begin(), sortedOrder.end(),
267  [&](const size_t& a, const size_t& b) {
268  return (globalRowBuf[a] > globalRowBuf[b]);
269  }
270  );
271 
272  // Compute inverse permutation; it is more useful for our needs
273  for (size_t i = 0; i < nrows; i++) {
274  permuteIndex[sortedOrder[i]] = i;
275  }
276  }
277 
278  Teuchos::broadcast<int,gno_t>(*comm, 0, permuteIndex(0,nrows));
279  // Ick! Use a directory TODO
280 
281  // Sorting is changing the global IDs associated
282  // with rows/columns. To make this distribution applicable beyond
283  // triangle counting (e.g., in a Tpetra operator), we need a way
284  // to map from the original global IDs and back again.
285  // Create a permutation matrix for use in the operator; use
286  // default Tpetra layout.
287  Teuchos::Array<gno_t> myRows;
288  for (size_t i = 0; i < nrows; i++) {
289  if (VecMine(i)) myRows.push_back(i);
290  }
291 
292  Tpetra::global_size_t dummy =
293  Teuchos::OrdinalTraits<Tpetra::global_size_t>::invalid();
294  Teuchos::RCP<const map_t> permMap =
295  rcp(new map_t(dummy, myRows(), 0, comm));
296 
297  permMatrix = rcp(new matrix_t(permMap, 1)); // one nz / row in permMatrix
298 
299  Teuchos::Array<gno_t> cols(1);
300  Teuchos::Array<scalar_t> vals(1); vals[0] = 1.;
301 
302  for (size_t i = 0; i < permMap->getNodeNumElements(); i++) {
303  gno_t gid = permMap->getGlobalElement(i);
304  cols[0] = permuteIndex[gid];
305  permMatrix->insertGlobalValues(gid, cols(), vals());
306  }
307 
308  permMatrix->fillComplete(permMap, permMap);
309  }
310 
311  // Now, to determine the chunks, we care only about the number of
312  // nonzeros in the lower triangular matrix.
313  // Compute nnzPerRow; distribution is currently 1D
314  nnzPerRow.assign(nMyRows, 0);
315  size_t nnz = 0;
316  for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
317  gno_t I = (sortByDegree ? permuteIndex[it->first.first]
318  : it->first.first);
319  gno_t J = (sortByDegree ? permuteIndex[it->first.second]
320  : it->first.second);
321  if (J <= I) {// Lower-triangular part
322  nnzPerRow[it->first.first - myFirstRow]++;
323  nnz++;
324  }
325  }
326 
327  // TODO Dunno why there isn't a Teuchos::gatherAllv;
328  // TODO for now, compute and broadcast
329 
330  Teuchos::gatherv<int,gno_t>(nnzPerRow.getRawPtr(), nMyRows,
331  globalRowBuf.getRawPtr(),
332  rcvcnt.getRawPtr(), disp.getRawPtr(),
333  0, *comm);
334 
335  Teuchos::Array<int>().swap(rcvcnt); // no longer needed
336  Teuchos::Array<int>().swap(disp); // no longer needed
337 
338  size_t gNnz;
339  Teuchos::reduceAll(*comm, Teuchos::REDUCE_SUM, 1, &nnz, &gNnz);
340 
341  chunkCuts.resize(nChunks+1, 0);
342 
343 
344  if (me == 0) { // TODO: when have allgatherv, can do on all procs
345  // TODO: or better, implement parallel version
346 
347  // Determine chunk cuts
348  size_t target = gNnz / np; // target nnz per processor
349  size_t targetRunningTotal = 0;
350  size_t currentRunningTotal = 0;
351  gno_t I = gno_t(0);
352  for (int chunkCnt = 0; chunkCnt < nChunks; chunkCnt++) {
353  targetRunningTotal = (target * (chunkCnt+1));
354  currentRunningTotal = 0;
355  while (I < static_cast<gno_t>(nrows)) {
356  size_t nextNnz = (sortByDegree ? globalRowBuf[sortedOrder[I]]
357  : globalRowBuf[I]);
358  if (currentRunningTotal + nextNnz <= targetRunningTotal) {
359  currentRunningTotal += nextNnz;
360  I++;
361  }
362  else
363  break;
364  }
365  chunkCuts[chunkCnt+1] = I;
366  }
367  chunkCuts[nChunks] = static_cast<gno_t>(nrows);
368  }
369 
370  // Free memory associated with globalRowBuf
371  Teuchos::Array<gno_t>().swap(globalRowBuf);
372 
373  Teuchos::broadcast<int,gno_t>(*comm, 0, chunkCuts(0,nChunks+1));
374  chunksComputed = true;
375 
376  //std::cout << comm->getRank() << " KDDKDD chunkCuts: ";
377  //for (int kdd=0; kdd <= nChunks; kdd++) std::cout << chunkCuts[kdd] << " ";
378  //std::cout << std::endl;
379 
380  // Determine new owner of each nonzero; buffer for sending
381  Teuchos::Array<gno_t> iOut(localNZ.size());
382  Teuchos::Array<gno_t> jOut(localNZ.size());
383  Teuchos::Array<scalar_t> vOut(localNZ.size());
384  Teuchos::Array<int> pOut(localNZ.size());
385 
386  size_t sendCnt = 0;
387  for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
388  iOut[sendCnt] = (sortByDegree ? permuteIndex[it->first.first]
389  : it->first.first);
390  jOut[sendCnt] = (sortByDegree ? permuteIndex[it->first.second]
391  : it->first.second);
392  if (jOut[sendCnt] <= iOut[sendCnt]) { // keep only lower diagonal entries
393  vOut[sendCnt] = it->second;
394  pOut[sendCnt] = procFromChunks(iOut[sendCnt], jOut[sendCnt]);
395 
396  //std::cout << comm->getRank()
397  // << " KDDKDD IJ ("
398  // << it->first.first << "," << it->first.second
399  // << ") permuted to ("
400  // << iOut[sendCnt] << "," << jOut[sendCnt]
401  // << ") being sent to " << pOut[sendCnt]
402  // << std::endl;
403  sendCnt++;
404  }
405  }
406 
407  // Free memory associated with localNZ and permuteIndex
408  LocalNZmap_t().swap(localNZ);
409  if (sortByDegree) Teuchos::Array<gno_t>().swap(permuteIndex);
410 
411  // Use a Distributor to send nonzeros to new processors.
412  Tpetra::Distributor plan(comm);
413  size_t nrecvs = plan.createFromSends(pOut(0,sendCnt));
414  Teuchos::Array<gno_t> iIn(nrecvs);
415  Teuchos::Array<gno_t> jIn(nrecvs);
416  Teuchos::Array<scalar_t> vIn(nrecvs);
417 
418  // TODO: With more clever packing, could do only one round of communication
419  plan.doPostsAndWaits<gno_t>(iOut(0,sendCnt), 1, iIn());
420  plan.doPostsAndWaits<gno_t>(jOut(0,sendCnt), 1, jIn());
421  plan.doPostsAndWaits<scalar_t>(vOut(0,sendCnt), 1, vIn());
422 
423  // Put received nonzeros in map
424  for (size_t n = 0; n < nrecvs; n++) {
425  NZindex_t nz(iIn[n], jIn[n]);
426  localNZ[nz] = vIn[n];
427  }
428 
429  redistributed = true;
430  }
431 
432  Teuchos::RCP<matrix_t> getPermutationMatrix() const { return permMatrix; }
433 
434 private:
435  // Initially distribute nonzeros with a 1D linear distribution
436  Distribution1DLinear<gno_t,scalar_t> initialDist;
437 
438  // Flag indicating whether matrix should be reordered and renumbered
439  // in decreasing sort order of number of nonzeros per row in full matrix
440  bool sortByDegree;
441 
442  // Column permutation matrix built only when sortByDegree = true;
443  Teuchos::RCP<matrix_t> permMatrix;
444 
445  // Flag whether redistribution has occurred yet
446  // This is true
447  // i) after Tpetra performs the redistribution or
448  // ii) when Tpetra reads already-distributed nonzeros by readPerProcess function
449  bool redistributed;
450 
451  // If we read the already-distributed nonzeros from per-process files,
452  // this will remain false until a triangle counting code actually computes
453  // the chunks when the need arises.
454  bool chunksComputed;
455 
456  int nChunks; // in np = q(q+1)/2 layout, nChunks = q
457  double nChunksPerRow;
458  Teuchos::Array<gno_t> chunkCuts;
459 
460  int findIdxInChunks(gno_t I) {
461  int m = I * nChunksPerRow;
462  while (I < chunkCuts[m]) m--;
463  while (I >= chunkCuts[m+1]) m++;
464  return m;
465  }
466 
467  int procFromChunks(gno_t I, gno_t J) {
468  int m = findIdxInChunks(I);
469  int n = findIdxInChunks(J);
470  int p = m*(m+1)/2 + n;
471  return p;
472  }
473 };
474 
475 
477 // Tpetra::Operator that works with the DistributionLowerTriangularBlock
478 
479 template <typename scalar_t,
480  class Node = ::Tpetra::Details::DefaultTypes::node_type>
481 class LowerTriangularBlockOperator :
482  public Tpetra::Operator<scalar_t, Tpetra::Map<>::local_ordinal_type,
483  Tpetra::Map<>::global_ordinal_type,
484  Node>
485 {
486 public:
487  using lno_t = Tpetra::Map<>::local_ordinal_type;
489  using map_t = Tpetra::Map<>;
490  using import_t = Tpetra::Import<>;
491  using export_t = Tpetra::Export<>;
492  using vector_t = Tpetra::Vector<scalar_t>;
493  using mvector_t = Tpetra::MultiVector<scalar_t>;
495  using dist_t = Tpetra::DistributionLowerTriangularBlock<gno_t, scalar_t>;
496 
497  LowerTriangularBlockOperator(
498  const Teuchos::RCP<const matrix_t> &lowerTriangularMatrix_,
499  const dist_t &dist)
500  : lowerTriangularMatrix(lowerTriangularMatrix_),
501  permMatrix(dist.getPermutationMatrix())
502  {
503  // LowerTriangularBlockOperator requires the range map and domain map
504  // to be the same. Check it here.
505  TEUCHOS_TEST_FOR_EXCEPTION(
506  !lowerTriangularMatrix->getRangeMap()->isSameAs(
507  *lowerTriangularMatrix->getDomainMap()),
508  std::logic_error,
509  "The Domain and Range maps of the LowerTriangularBlock matrix "
510  "must be the same");
511 
512  // Extract diagonals
513 
514  vector_t diagByRowMap(lowerTriangularMatrix->getRowMap());
515  lowerTriangularMatrix->getLocalDiagCopy(diagByRowMap);
516  diag = Teuchos::rcp(new vector_t(lowerTriangularMatrix->getRangeMap()));
517  Tpetra::Export<> exporter(lowerTriangularMatrix->getRowMap(),
518  lowerTriangularMatrix->getRangeMap());
519  diag->doExport(diagByRowMap, exporter, Tpetra::ADD);
520  }
521 
522  void apply(const mvector_t &x, mvector_t &y, Teuchos::ETransp mode,
523  scalar_t alpha, scalar_t beta) const
524  {
525  scalar_t ZERO = Teuchos::ScalarTraits<scalar_t>::zero();
526  scalar_t ONE = Teuchos::ScalarTraits<scalar_t>::one();
527  if (alpha == ZERO) {
528  if (beta == ZERO) y.putScalar(ZERO);
529  else y.scale(beta);
530  return;
531  }
532 
533  if (permMatrix == Teuchos::null) {
534 
535  // Multiply lower triangular
536  lowerTriangularMatrix->apply(x, y, Teuchos::NO_TRANS, alpha, beta);
537 
538  // Multiply upper triangular
539  lowerTriangularMatrix->apply(x, y, Teuchos::TRANS, alpha, ONE);
540 
541  // Subtract out duplicate diagonal terms
542  y.elementWiseMultiply(-alpha, *diag, x, ONE);
543  }
544  else {
545 
546  // With sorting, the LowerTriangularBlock distribution stores (P^T A P)
547  // in the CrsMatrix, for permutation matrix P.
548  // Thus, apply must compute
549  // y = P (beta (P^T y) + alpha (P^T A P) (P^T x))
550 
551  vector_t xtmp(x.getMap(), x.getNumVectors());
552  vector_t ytmp(y.getMap(), y.getNumVectors());
553 
554  permMatrix->apply(x, xtmp, Teuchos::TRANS);
555  if (beta != ZERO) permMatrix->apply(y, ytmp, Teuchos::TRANS);
556 
557  // Multiply lower triangular
558  lowerTriangularMatrix->apply(xtmp, ytmp, Teuchos::NO_TRANS, alpha, beta);
559 
560  // Multiply upper triangular
561  lowerTriangularMatrix->apply(xtmp, ytmp, Teuchos::TRANS, alpha, ONE);
562 
563  // Subtract out duplicate diagonal terms
564  ytmp.elementWiseMultiply(-alpha, *diag, xtmp, ONE);
565 
566  permMatrix->apply(ytmp, y, Teuchos::NO_TRANS);
567  }
568  }
569 
570  Teuchos::RCP<const map_t> getDomainMap() const {
571  return lowerTriangularMatrix->getDomainMap();
572  }
573 
574  Teuchos::RCP<const map_t> getRangeMap() const {
575  return lowerTriangularMatrix->getRangeMap();
576  }
577 
578  bool hasTransposeApply() const {return true;} // Symmetric matrix
579 
580 private:
581  const Teuchos::RCP<const matrix_t > lowerTriangularMatrix;
582  const Teuchos::RCP<const matrix_t > permMatrix;
583  Teuchos::RCP<vector_t> diag;
584 };
585 
586 
587 }
588 #endif
Functions for initializing and finalizing Tpetra.
Sparse matrix that presents a row-oriented interface that lets users read or modify entries.
Sets up and executes a communication plan for a Tpetra DistObject.
Communication plan for data redistribution from a (possibly) multiply-owned to a uniquely-owned distr...
Communication plan for data redistribution from a uniquely-owned to a (possibly) multiply-owned distr...
A parallel distribution of indices over processes.
GlobalOrdinal global_ordinal_type
The type of global indices.
LocalOrdinal local_ordinal_type
The type of local indices.
One or more distributed dense vectors.
Abstract interface for operators (e.g., matrices and preconditioners).
virtual void apply(const MultiVector< scalar_t, Tpetra::Map<>::local_ordinal_type, Tpetra::Map<>::global_ordinal_type, ::Tpetra::Details::DefaultTypes::node_type > &X, MultiVector< scalar_t, Tpetra::Map<>::local_ordinal_type, Tpetra::Map<>::global_ordinal_type, ::Tpetra::Details::DefaultTypes::node_type > &Y, Teuchos::ETransp mode=Teuchos::NO_TRANS, scalar_t alpha=Teuchos::ScalarTraits< scalar_t >::one(), scalar_t beta=Teuchos::ScalarTraits< scalar_t >::zero()) const=0
Computes the operator-multivector application.
A distributed dense vector.
Namespace Tpetra contains the class and methods constituting the Tpetra library.
void sort(View &view, const size_t &size)
Convenience wrapper for std::sort for host-accessible views.
size_t global_size_t
Global size_t object.
@ ADD
Sum new values.
@ ZERO
Replace old values with zero.