Introduction

本文内容主要系摘录翻译自Ang Li的博士毕业论文。

1.Perfect

Power Efficiency Revolution for Embedded Computing

http://hpc.pnl.gov/PERFECT/

Application DomainsKernels
PERFECT Application 1Discrete Wavelet Transform
2D Convolution
Histogram Equalization
Space Time Adaptive ProcessingSystem Solver
Inner Product
Outer Product
Synthetic Aperture RadarInterpolation 1
Interpolation 2
Back Projection (Non-Fourier SAR)
Wide Area Motion ImagingDebayer
Image Registration
Change Detection
Required KernelsSort
FFT 1D
FFT 2D

2. AxBench

A Multiplatform Benchmark Suite for Approximate Computing

One of the goals of AxBench is to provide a diverse set of applications to further facilitate research and development in approximate computing.

http://ieeexplore.ieee.org/abstract/document/7755728/

下载地址

http://axbench.org/

benchmarkplatformdomainQuality Metric
binarizationGPUImage ProcessingImage Diff
blackscholesCPU, GPUFinanceAvg. Relative Error
brent-kungASICArithmetic ComputationAvg. Relative Error
cannealCPUOptimizationAvg. Relative Error
convolutionGPUMachine LearningAvg. Relative Error
fastwalshGPUSignal ProcessingImage Diff
fftCPUSignal ProcessingAvg. Relative Error
firASICSignal ProcessingAvg. Relative Error
forwardk2jCPU, ASICRoboticsAvg. Relative Error
inversek2jCPU, GPU, ASICRoboticsAvg. Relative Error
jmeintCPU, GPU3D GamingMiss Rate
jpegCPUImage ProcessingImage Diff
kmeansCPU, ASICMachine LearningImage Diff
kogge-stoneASICArithmetic ComputationAvg. Relative Error
laplacianGPUImage ProcessingImage Diff
meanfilterGPUMachine VisionImage Diff
neural networkASICMachine LearningAvg. Relative Error
newton-raphGPUNumerical AnalysisAvg. Relative Error
sobelCPU, GPU, ASICImage ProcessingImage Diff
sradGPUMedical ImagingImage Diff
wallace-treeASICArithmetic ComputationAvg. Relative Error

3. Rodinia

http://rodinia.cs.virginia.edu/

下载页面:

http://lava.cs.virginia.edu/Rodinia/download_links.htm

ApplicationsDwarvesDomainsParallel ModelIncre. Ver.
LeukocyteStructured GridMedical ImagingCUDA, OMP, OCL
Heart WallStructured GridMedical ImagingCUDA, OMP, OCL
MUMmerGPUGraph TraversalBioinformaticsCUDA, OMP
CFD Solver1Unstructured GridFluid DynamicsCUDA, OMP, OCL
LU DecompositionDense Linear AlgebraLinear AlgebraCUDA, OMP, OCL
HotSpotStructured GridPhysics SimulationCUDA, OMP, OCL
Back PropagationUnstructured GridPattern RecognitionCUDA, OMP, OCL
Needleman-WunschDynamic ProgrammingBioinformaticsCUDA, OMP, OCL
KmeansDense Linear AlgebraData MiningCUDA, OMP, OCL
Breadth-First Search1Graph TraversalGraph AlgorithmsCUDA, OMP, OCL
SRADStructured GridImage ProcessingCUDA, OMP, OCL
Streamcluster1Dense Linear AlgebraData MiningCUDA, OMP, OCL
Particle FilterStructured GridMedical ImagingCUDA, OMP, OCL
PathFinderDynamic ProgrammingGrid TraversalCUDA, OMP, OCL
Gaussian EliminationDense Linear AlgebraLinear AlgebraCUDA, OCL
k-Nearest NeighborsDense Linear AlgebraData MiningCUDA, OMP, OCL
LavaMD2N-BodyMolecular DynamicsCUDA, OMP, OCL
MyocyteStructured GridBiological SimulationCUDA, OMP, OCL
B+ TreeGraph TraversalSearchCUDA, OMP, OCL
GPUDWTSpectral MethodImage/Video CompressionCUDA, OCL
Hybrid SortSortingSorting AlgorithmsCUDA, OCL
Hotspot3DStructured GridPhysics SimulationCUDA, OCL, OMPHotspot for 3D IC
HuffmanFinite State MachineLossless data compressionCUDA, OCL

Ang Li的分类:

ApplicationDescriptionDomainCUDAOpenCLOpenMP
backpropPerceptron back propagationNeural NetworkYesYesYes
bfsBreadth first searchGraph AlgorithmYesYesYes
b+treeB+tree OperationSearching YesYesYes
leukocyteDetect leukocytes in blood vessel videoMedical ImagingYesYesYes
heartwallTracks the mouse heart movement by stimulusMedical ImagingYesNoYes
cfdFinite volume solver for 3D Euler equations for flowFluid DynamicsYesYesYes
ludCalculate the solutions of a set of linear equationsLinear AlgebraYesYesYes
hotspotEstimate processor temperaturePhysical SimulationYesYesYes
nwOptimization method for DNA sequence alignmentsBioinformaticsYesYesYes
kmeansClustering algorithmData MiningYesYesYes
sradSpeckle reducing anisotropic diffusionImage ProcessingYesYesYes
streamclusterFinds medians to assign points to nearest centersData MiningYesYesYes
particlefilterLocate object location based on Noise and pathMedical ImagingYesYesYes
pathfinderDynamic programming to find a path on a 2D grid GridTraversalYesYesYes
gaussianSolving variables in a linear systemLinear AlgebraYesYesNo
nnFind k-nearest neighbors from an unstructured data setData MiningYesYesYes
lavaMDCalculate particle potential and relocation in 3DMolecular DynamicsYesYesYes
myocyteSimulate the behavior of cardiac hear muscle cellBiological SimulationYesYesYes

4. Parboil

Parboil强调面向吞吐量的流媒体应用。其中的每个应用都有原生的CUDA应用和优化过的应用。

ApplicationDescriptionDomainCUDAOpenCLC
bfsBreadth-first-searchGraph AlgorithmYesYesYes
cutcpCompute Coulombic potential for a 3D gridMolecular DynamicsYesYesYes
histogramCompute 2D saturating histogram with maximum 256 binsData MiningYesYesYes
lbmFluid dynamics simulation using Lattice-Bolzmann MethodFluid DynamicsYesYesYes
mmDense matrix-matrix multiplyLinear AlgebraYesYesYes
mri-griddingCompute regular data grid via weighted interpolationMedical ImagingYesYesYes
mir-qCompute scanner configuration for calibration in 3D MRIMedical ImagingYesYesYes
sadSum of absolute differences kernel in MPEG video encodersImage ProcessingYesYesYes
spmvCompute the product of a sparse matrix with a dense vectorLinear AlgebraYesYesYes
stencilAn iterative Jacobi stencil operation on a regular 3D gridCellular AutomationYesYesYes
tpacfAnalyze the spatial distribution of astronomical bodiesData MiningYesYesYes

5. Shoc

测量协处理的稳定性和性能,such as GPUs, Xeon-Phi, etc。

ApplicationDescriptionDomainCUDAOpenCLC
qtclusteringGroup genes into high quality clustersBioinformaticsYesNoNo
s3dCompute chemical reaction rate across a 3D gridSimulationYesYesNo
scanParallel prefix sum of floating point numbersData MiningYesYesNo
reductionSum reduction operation of floating point numbersData MiningYesYesNo
mdLennard-Jones potential computationsMolecular DynamicsYesYesNo
fftFast Fourier transformSignal ProcessingYesYesNo
sgemmSingle precision general matrix multiplyLinear AlgebraYesYesNo
sortFast radix sort programData MiningYesYesNo
stencil2dStandard 2d 9 points stencil calculationCellular AutomationYesYesNo
bfsBreadth-first-searchGraph AlgorithmYesYesNo
spmvSparse matrix vector multiplicationLinear AlgebraYesYesYes

6. Polybench

包含从[非]结构循环嵌套转换的Kernel。这些循环以前用于评估基于多面体模型的优化工具。

ApplicationDescriptionDomainCUDAOpenCLC
2dconv2D convolutionLinear AlgebraYesYesYes
2mm2 matrix multiplyLinear AlgebraYesYesYes
3dconv3D convolutionLinear AlgebraYesYesYes
3mm3 matrix multiplyLinear AlgebraYesYesYes
ataxMatrix transpose and vector multiplicationLinear AlgebraYesYesYes
bicgBicg kernel for BiCGStab linear solverLinear AlgebraYesYesYes
corrCorrelation computationLinear AlgebraYesYesYes
covarCovariance computationLinear AlgebraYesYesYes
fdtd2d2D finite difference time domain kernelSimulationYesYesYes
gemmmatrix multiplyLinear AlgebraYesYesYes
gesummvScalar vector and matrix multiplicationLinear AlgebraYesYesYes
gramschmGram-schmidt processLinear AlgebraYesYesYes
mvtMatrix vector product and transposeLinear AlgebraYesYesYes
syr2kSymmetric rank-2k operationsLinear AlgebraYesYesYes
syrkSymmetric rank-k operationsLinear AlgebraYesYesYes

7. Mars

用map reduce实现的data-mining的benchmark。

ApplicationDescriptionDomainCUDAOpenCLC
smFind the position of a string in a fileData MiningYesNoNo
iiBuild inverted index for links in HTML filesData MiningYesNoNo
ssCompute pair-wise similarity score for docsData MiningYesNoNo
mmMultiply two matricesLinear AlgebraYesNoNo
pvcCount distinct page views from web logsData MiningYesNoNo
pvrFind the top ten hottest pages in the web logData MiningYesNoNo

8. Longstar

关注于不规则的应用,主要是数据依赖和拓扑依赖。

ApplicationDescriptionDomainCUDAOpenCLC
bfsBreadth first searchGraph AlgorithmYesNoNo
bhSimulate the gravitational forces in Barnes-Hutalgorithm SimulationYesNoNo
dcLossless compression upon double-precision FP dataSignal ProcessingYesNoNo
dmrMeshrefinement algorithm from computational geometryImage ProcessingYesNoNo
ptaAndersen’s flow/context-insensitive points-to analysisGraph AlgorithmYesNoNo
spHeuristic SAT-solver based on BaYesian inferenceGraph AlgorithmYesNoNo
ssspShortest path in a directed graph with weighted edgesGraph AlgorithmYesNoNo
tspTraveling salesman problemGraph AlgorithmYesNoNo

9. CUDA SDK

ApplicationDescriptionDomainCUDAOpenCLC
bilateralFilterEdge-preserving non-linear smoothing filterImage ProcessingYesYesYes
binomialOptionEvaluate option call price using binomial modelComputational FinanceYesYesYes
BlackScholesEvaluate option call price using Black-Scholes modelComputational FinanceYesYesYes
convolutionFFT2D2D convolutions using FFTImage ProcessingYesYesYes
dct8x8Discrete cosine transform for blocks of 8 by 8 pixelsImage ProcessingYesYesYes
dxtcHigh quality DXT compressionImage ProcessingYesYesYes
dwtHaar1D1D discrete Haar wavelet decompositionImage ProcessingYesYesYes
eigenvaluesEigenvalues of a tridiagonal symmetric matrixLinear AlgebraYesYesYes
fastWalshTransformHadamard-ordered Fast Walsh transformLinear AlgebraYesYesYes
FDTD3dFinite differencestime domain progression stencilCellular AutomationYesYes
grabcutNPPGrabCut approach using the 8 neighborhoodGraph AlgorithmYesYesYes
histogram64/256 bin histogramData MiningYesYesYes
imageDenoisingUsing KNN and NLM for image denoisingImage ProcessingYesYesYes
lineOfSightA simple line-of-sight algorithmGraphic ApplicationYesYesYes
MandelbrotMandelbrot or Julia sets interactivelyGraphic ApplicationYesYesYes
matrixMulMatrix multiplicationLinear AlgebraYesYesYes
mergeSortvMerge Sort algorithmData MiningYesYesNo
MersenneTwisterThe Mersenne Twister random number generatorSignal ProcessingYesYesYes
MonteCarloEvaluate option call price using Monte Carlo approachComputational FinanceYesYesYes
nbodyAll-pairs gravitational n-body simulationSimulationYesYesYes
oceanFFTSimulate an Ocean height fieldSimulationYesYesYes
reductionCompute the sum of a large arrays of valuesData MiningYesYesNo
scalarProdCalculate scalar products of input vector pairsLinear AlgebraYesYesYes
scanParallel prefix sumData MiningYesYesYes
SobelFilterSobel edge detection filter for 8-bit monochrome imagesImage ProcessingYesYesYes
SobolQRNGSobol Quasirandom Sequence GeneratorComputational FinanceYesYesYes
transposeMatrix transposeLinear AlgebraYesYesYes

10. GPGPU-Sim

ApplicationDescriptionDomainCUDAOpenCLC
aesAES algorithm in CUDA to encrypt and decrypt filesCryptographyYesNoNo
dcA discontinuous Galerkin time-domain solverSimulationYesNoNo
lps3D Laplace SolverComputational FinanceYesNoNo
libMonte Carlo simulation in London-interbank-offered-rate ModelComputational FinanceYesNoNo
mumPairwise local sequence alignment for DNA stringBioinformaticsYesNoNo
nnConvolutional neural network to recognize handwritten digitsMachine LearningYesNoNo
nquThe N-Queen solverSimulationYesNoNo
rayRay-tracing (rendering graphics with near photo-realism)Graphic ApplicationYesNoNo
stoSliding-window implementation of the MD5 algorithmData MiningYesNoNo
wpAccelerate part of the Weather Research and Forecast Model (WRF)SimulationYesNoNo

Reference

https://www.findhao.net/easycoding/2304.html