5 years ago · ea134420cf
--- a/report.md
+++ b/report.md
@@ -428,6 +428,38 @@ end for
 
				 

			
 
				 ##### 并行算法描述

			
 
				 

			
 
				+###### PCAM设计

			
 
				+

			
 
				+1）划分

			
 
				+

			
 
				+从功能上看，上面算法中主要可分为`it,ft,gt,ot,ct,ht`

			
 
				+

			
 
				+从任务负载来看，上面算法中运行时间比重较大的是矩阵乘加运算——`it,ft,gt`和`ot`的工作量相同，而`ct,ht`的工作量稍少。

			
 
				+

			
 
				+在每个矩阵乘加运算中，可以按照分块矩阵乘法对矩阵乘加进一步划分。

			
 
				+

			
 
				+![p](img/1.svg)

			
 
				+

			
 
				+2）通信

			
 
				+

			
 
				+计算`it,ft,gt,ot`需要`xt,ht-1`，而`ct`需要`ft,it,gt`，`ht`需要`ot,ct`

			
 
				+

			
 
				+![c](img/2.svg)

			
 
				+

			
 
				+3）组合

			
 
				+

			
 
				+考虑`ct,ht`工作量较少，且需要传输较多数据，将`ct,ht`和`it`合并

			
 
				+

			
 
				+![a](3.svg)

			
 
				+

			
 
				+4）映射

			
 
				+

			
 
				+`it,ct,ht; ft; gt; ot`分别放在4个处理机上，这时这4个处理机任务量非常接近，且此时传输数据量较少。

			
 
				+

			
 
				+而`it,ct,ht; ft; gt; ot`各自计算矩阵乘时，再将矩阵分块，分配到各自通信域中的处理机上。

			
 
				+

			
 
				+![m](img/4.svg)

			
 
				+

			
 
				 ###### 依赖关系分析

			
 
				 

			
 
				 从上面公式和串行伪代码来看，依赖关系非常明显，输出$c_t,h_t$流依赖于$c_{(t-1)},f_t,g_t,i_t,o_t$，而$i_t,f_t,g_t,o_t$又六依赖于$h_{(t-1)}$，对于这种随时间的迭代计算，不同时间$t$之间不能并行计算，因而考虑$i_t,f_t,g_t,o_t$可以并行计算，而在$i_t,f_t,g_t,o_t$内有矩阵乘加计算，也可以使用分块矩阵的并行计算。

			
@@ -566,7 +598,7 @@ void sigmoid(float *x, float *y, int n) {
 
				 | 分到的程序号 | 1       | 1     | 1    |

			
 
				 | 分到的程序   | closure | gauss | fft  |

			
 
				 

			
 
				-### closure

			
 
				+### closure-MPI

			
 
				 

			
 
				 ##### 性能结果

			
 
				 

			
@@ -576,7 +608,9 @@ void sigmoid(float *x, float *y, int n) {
 
				 

			
 
				 

			
 
				 

			
 
				-### gauss

			
 
				+### closure-Hybrid-omp-mpi

			
 
				+

			
 
				+### gauss-MPI

			
 
				 

			
 
				 ##### 性能结果

			
 
				 

			
@@ -609,5 +643,5 @@ void sigmoid(float *x, float *y, int n) {
 
				 

			
 
				 *完整代码见附件*

			
 
				 

			
 
				-### fft

			
 
				+### fft-MPI