GPU_benchmark说明(转)
Introduction
本文内容主要系摘录翻译自Ang Li的博士毕业论文。
1.Perfect
Power Efficiency Revolution for Embedded Computing
Application Domains | Kernels |
---|---|
PERFECT Application 1 | Discrete Wavelet Transform |
2D Convolution | |
Histogram Equalization | |
Space Time Adaptive Processing | System Solver |
Inner Product | |
Outer Product | |
Synthetic Aperture Radar | Interpolation 1 |
Interpolation 2 | |
Back Projection (Non-Fourier SAR) | |
Wide Area Motion Imaging | Debayer |
Image Registration | |
Change Detection | |
Required Kernels | Sort |
FFT 1D | |
FFT 2D |
2. AxBench
A Multiplatform Benchmark Suite for Approximate Computing
One of the goals of AxBench is to provide a diverse set of applications to further facilitate research and development in approximate computing.
http://ieeexplore.ieee.org/abstract/document/7755728/
下载地址
benchmark | platform | domain | Quality Metric |
---|---|---|---|
binarization | GPU | Image Processing | Image Diff |
blackscholes | CPU, GPU | Finance | Avg. Relative Error |
brent-kung | ASIC | Arithmetic Computation | Avg. Relative Error |
canneal | CPU | Optimization | Avg. Relative Error |
convolution | GPU | Machine Learning | Avg. Relative Error |
fastwalsh | GPU | Signal Processing | Image Diff |
fft | CPU | Signal Processing | Avg. Relative Error |
fir | ASIC | Signal Processing | Avg. Relative Error |
forwardk2j | CPU, ASIC | Robotics | Avg. Relative Error |
inversek2j | CPU, GPU, ASIC | Robotics | Avg. Relative Error |
jmeint | CPU, GPU | 3D Gaming | Miss Rate |
jpeg | CPU | Image Processing | Image Diff |
kmeans | CPU, ASIC | Machine Learning | Image Diff |
kogge-stone | ASIC | Arithmetic Computation | Avg. Relative Error |
laplacian | GPU | Image Processing | Image Diff |
meanfilter | GPU | Machine Vision | Image Diff |
neural network | ASIC | Machine Learning | Avg. Relative Error |
newton-raph | GPU | Numerical Analysis | Avg. Relative Error |
sobel | CPU, GPU, ASIC | Image Processing | Image Diff |
srad | GPU | Medical Imaging | Image Diff |
wallace-tree | ASIC | Arithmetic Computation | Avg. Relative Error |
3. Rodinia
http://rodinia.cs.virginia.edu/
下载页面:
http://lava.cs.virginia.edu/Rodinia/download_links.htm
Applications | Dwarves | Domains | Parallel Model | Incre. Ver. |
Leukocyte | Structured Grid | Medical Imaging | CUDA, OMP, OCL | √ |
Heart Wall | Structured Grid | Medical Imaging | CUDA, OMP, OCL | |
MUMmerGPU | Graph Traversal | Bioinformatics | CUDA, OMP | |
CFD Solver1 | Unstructured Grid | Fluid Dynamics | CUDA, OMP, OCL | |
LU Decomposition | Dense Linear Algebra | Linear Algebra | CUDA, OMP, OCL | √ |
HotSpot | Structured Grid | Physics Simulation | CUDA, OMP, OCL | |
Back Propagation | Unstructured Grid | Pattern Recognition | CUDA, OMP, OCL | |
Needleman-Wunsch | Dynamic Programming | Bioinformatics | CUDA, OMP, OCL | √ |
Kmeans | Dense Linear Algebra | Data Mining | CUDA, OMP, OCL | |
Breadth-First Search1 | Graph Traversal | Graph Algorithms | CUDA, OMP, OCL | |
SRAD | Structured Grid | Image Processing | CUDA, OMP, OCL | √ |
Streamcluster1 | Dense Linear Algebra | Data Mining | CUDA, OMP, OCL | |
Particle Filter | Structured Grid | Medical Imaging | CUDA, OMP, OCL | |
PathFinder | Dynamic Programming | Grid Traversal | CUDA, OMP, OCL | |
Gaussian Elimination | Dense Linear Algebra | Linear Algebra | CUDA, OCL | |
k-Nearest Neighbors | Dense Linear Algebra | Data Mining | CUDA, OMP, OCL | |
LavaMD2 | N-Body | Molecular Dynamics | CUDA, OMP, OCL | |
Myocyte | Structured Grid | Biological Simulation | CUDA, OMP, OCL | |
B+ Tree | Graph Traversal | Search | CUDA, OMP, OCL | |
GPUDWT | Spectral Method | Image/Video Compression | CUDA, OCL | |
Hybrid Sort | Sorting | Sorting Algorithms | CUDA, OCL | |
Hotspot3D | Structured Grid | Physics Simulation | CUDA, OCL, OMP | Hotspot for 3D IC |
Huffman | Finite State Machine | Lossless data compression | CUDA, OCL |
Ang Li的分类:
Application | Description | Domain | CUDA | OpenCL | OpenMP |
---|---|---|---|---|---|
backprop | Perceptron back propagation | Neural Network | Yes | Yes | Yes |
bfs | Breadth first search | Graph Algorithm | Yes | Yes | Yes |
b+tree | B+tree Operation | Searching Yes | Yes | Yes | |
leukocyte | Detect leukocytes in blood vessel video | Medical Imaging | Yes | Yes | Yes |
heartwall | Tracks the mouse heart movement by stimulus | Medical Imaging | Yes | No | Yes |
cfd | Finite volume solver for 3D Euler equations for flow | Fluid Dynamics | Yes | Yes | Yes |
lud | Calculate the solutions of a set of linear equations | Linear Algebra | Yes | Yes | Yes |
hotspot | Estimate processor temperature | Physical Simulation | Yes | Yes | Yes |
nw | Optimization method for DNA sequence alignments | Bioinformatics | Yes | Yes | Yes |
kmeans | Clustering algorithm | Data Mining | Yes | Yes | Yes |
srad | Speckle reducing anisotropic diffusion | Image Processing | Yes | Yes | Yes |
streamcluster | Finds medians to assign points to nearest centers | Data Mining | Yes | Yes | Yes |
particlefilter | Locate object location based on Noise and path | Medical Imaging | Yes | Yes | Yes |
pathfinder | Dynamic programming to find a path on a 2D grid Grid | Traversal | Yes | Yes | Yes |
gaussian | Solving variables in a linear system | Linear Algebra | Yes | Yes | No |
nn | Find k-nearest neighbors from an unstructured data set | Data Mining | Yes | Yes | Yes |
lavaMD | Calculate particle potential and relocation in 3D | Molecular Dynamics | Yes | Yes | Yes |
myocyte | Simulate the behavior of cardiac hear muscle cell | Biological Simulation | Yes | Yes | Yes |
4. Parboil
Parboil强调面向吞吐量的流媒体应用。其中的每个应用都有原生的CUDA应用和优化过的应用。
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
bfs | Breadth-first-search | Graph Algorithm | Yes | Yes | Yes |
cutcp | Compute Coulombic potential for a 3D grid | Molecular Dynamics | Yes | Yes | Yes |
histogram | Compute 2D saturating histogram with maximum 256 bins | Data Mining | Yes | Yes | Yes |
lbm | Fluid dynamics simulation using Lattice-Bolzmann Method | Fluid Dynamics | Yes | Yes | Yes |
mm | Dense matrix-matrix multiply | Linear Algebra | Yes | Yes | Yes |
mri-gridding | Compute regular data grid via weighted interpolation | Medical Imaging | Yes | Yes | Yes |
mir-q | Compute scanner configuration for calibration in 3D MRI | Medical Imaging | Yes | Yes | Yes |
sad | Sum of absolute differences kernel in MPEG video encoders | Image Processing | Yes | Yes | Yes |
spmv | Compute the product of a sparse matrix with a dense vector | Linear Algebra | Yes | Yes | Yes |
stencil | An iterative Jacobi stencil operation on a regular 3D grid | Cellular Automation | Yes | Yes | Yes |
tpacf | Analyze the spatial distribution of astronomical bodies | Data Mining | Yes | Yes | Yes |
5. Shoc
测量协处理的稳定性和性能,such as GPUs, Xeon-Phi, etc。
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
qtclustering | Group genes into high quality clusters | Bioinformatics | Yes | No | No |
s3d | Compute chemical reaction rate across a 3D grid | Simulation | Yes | Yes | No |
scan | Parallel prefix sum of floating point numbers | Data Mining | Yes | Yes | No |
reduction | Sum reduction operation of floating point numbers | Data Mining | Yes | Yes | No |
md | Lennard-Jones potential computations | Molecular Dynamics | Yes | Yes | No |
fft | Fast Fourier transform | Signal Processing | Yes | Yes | No |
sgemm | Single precision general matrix multiply | Linear Algebra | Yes | Yes | No |
sort | Fast radix sort program | Data Mining | Yes | Yes | No |
stencil2d | Standard 2d 9 points stencil calculation | Cellular Automation | Yes | Yes | No |
bfs | Breadth-first-search | Graph Algorithm | Yes | Yes | No |
spmv | Sparse matrix vector multiplication | Linear Algebra | Yes | Yes | Yes |
6. Polybench
包含从[非]结构循环嵌套转换的Kernel。这些循环以前用于评估基于多面体模型的优化工具。
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
2dconv | 2D convolution | Linear Algebra | Yes | Yes | Yes |
2mm | 2 matrix multiply | Linear Algebra | Yes | Yes | Yes |
3dconv | 3D convolution | Linear Algebra | Yes | Yes | Yes |
3mm | 3 matrix multiply | Linear Algebra | Yes | Yes | Yes |
atax | Matrix transpose and vector multiplication | Linear Algebra | Yes | Yes | Yes |
bicg | Bicg kernel for BiCGStab linear solver | Linear Algebra | Yes | Yes | Yes |
corr | Correlation computation | Linear Algebra | Yes | Yes | Yes |
covar | Covariance computation | Linear Algebra | Yes | Yes | Yes |
fdtd2d | 2D finite difference time domain kernel | Simulation | Yes | Yes | Yes |
gemm | matrix multiply | Linear Algebra | Yes | Yes | Yes |
gesummv | Scalar vector and matrix multiplication | Linear Algebra | Yes | Yes | Yes |
gramschm | Gram-schmidt process | Linear Algebra | Yes | Yes | Yes |
mvt | Matrix vector product and transpose | Linear Algebra | Yes | Yes | Yes |
syr2k | Symmetric rank-2k operations | Linear Algebra | Yes | Yes | Yes |
syrk | Symmetric rank-k operations | Linear Algebra | Yes | Yes | Yes |
7. Mars
用map reduce实现的data-mining的benchmark。
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
sm | Find the position of a string in a file | Data Mining | Yes | No | No |
ii | Build inverted index for links in HTML files | Data Mining | Yes | No | No |
ss | Compute pair-wise similarity score for docs | Data Mining | Yes | No | No |
mm | Multiply two matrices | Linear Algebra | Yes | No | No |
pvc | Count distinct page views from web logs | Data Mining | Yes | No | No |
pvr | Find the top ten hottest pages in the web log | Data Mining | Yes | No | No |
8. Longstar
关注于不规则的应用,主要是数据依赖和拓扑依赖。
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
bfs | Breadth first search | Graph Algorithm | Yes | No | No |
bh | Simulate the gravitational forces in Barnes-Hut | algorithm Simulation | Yes | No | No |
dc | Lossless compression upon double-precision FP data | Signal Processing | Yes | No | No |
dmr | Meshrefinement algorithm from computational geometry | Image Processing | Yes | No | No |
pta | Andersen’s flow/context-insensitive points-to analysis | Graph Algorithm | Yes | No | No |
sp | Heuristic SAT-solver based on BaYesian inference | Graph Algorithm | Yes | No | No |
sssp | Shortest path in a directed graph with weighted edges | Graph Algorithm | Yes | No | No |
tsp | Traveling salesman problem | Graph Algorithm | Yes | No | No |
9. CUDA SDK
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
bilateralFilter | Edge-preserving non-linear smoothing filter | Image Processing | Yes | Yes | Yes |
binomialOption | Evaluate option call price using binomial model | Computational Finance | Yes | Yes | Yes |
BlackScholes | Evaluate option call price using Black-Scholes model | Computational Finance | Yes | Yes | Yes |
convolutionFFT2D | 2D convolutions using FFT | Image Processing | Yes | Yes | Yes |
dct8x8 | Discrete cosine transform for blocks of 8 by 8 pixels | Image Processing | Yes | Yes | Yes |
dxtc | High quality DXT compression | Image Processing | Yes | Yes | Yes |
dwtHaar1D | 1D discrete Haar wavelet decomposition | Image Processing | Yes | Yes | Yes |
eigenvalues | Eigenvalues of a tridiagonal symmetric matrix | Linear Algebra | Yes | Yes | Yes |
fastWalshTransform | Hadamard-ordered Fast Walsh transform | Linear Algebra | Yes | Yes | Yes |
FDTD3d | Finite differences | time domain progression stencil | Cellular Automation | Yes | Yes |
grabcutNPP | GrabCut approach using the 8 neighborhood | Graph Algorithm | Yes | Yes | Yes |
histogram | 64/256 bin histogram | Data Mining | Yes | Yes | Yes |
imageDenoising | Using KNN and NLM for image denoising | Image Processing | Yes | Yes | Yes |
lineOfSight | A simple line-of-sight algorithm | Graphic Application | Yes | Yes | Yes |
Mandelbrot | Mandelbrot or Julia sets interactively | Graphic Application | Yes | Yes | Yes |
matrixMul | Matrix multiplication | Linear Algebra | Yes | Yes | Yes |
mergeSortv | Merge Sort algorithm | Data Mining | Yes | Yes | No |
MersenneTwister | The Mersenne Twister random number generator | Signal Processing | Yes | Yes | Yes |
MonteCarlo | Evaluate option call price using Monte Carlo approach | Computational Finance | Yes | Yes | Yes |
nbody | All-pairs gravitational n-body simulation | Simulation | Yes | Yes | Yes |
oceanFFT | Simulate an Ocean height field | Simulation | Yes | Yes | Yes |
reduction | Compute the sum of a large arrays of values | Data Mining | Yes | Yes | No |
scalarProd | Calculate scalar products of input vector pairs | Linear Algebra | Yes | Yes | Yes |
scan | Parallel prefix sum | Data Mining | Yes | Yes | Yes |
SobelFilter | Sobel edge detection filter for 8-bit monochrome images | Image Processing | Yes | Yes | Yes |
SobolQRNG | Sobol Quasirandom Sequence Generator | Computational Finance | Yes | Yes | Yes |
transpose | Matrix transpose | Linear Algebra | Yes | Yes | Yes |
10. GPGPU-Sim
Application | Description | Domain | CUDA | OpenCL | C |
---|---|---|---|---|---|
aes | AES algorithm in CUDA to encrypt and decrypt files | Cryptography | Yes | No | No |
dc | A discontinuous Galerkin time-domain solver | Simulation | Yes | No | No |
lps | 3D Laplace Solver | Computational Finance | Yes | No | No |
lib | Monte Carlo simulation in London-interbank-offered-rate Model | Computational Finance | Yes | No | No |
mum | Pairwise local sequence alignment for DNA string | Bioinformatics | Yes | No | No |
nn | Convolutional neural network to recognize handwritten digits | Machine Learning | Yes | No | No |
nqu | The N-Queen solver | Simulation | Yes | No | No |
ray | Ray-tracing (rendering graphics with near photo-realism) | Graphic Application | Yes | No | No |
sto | Sliding-window implementation of the MD5 algorithm | Data Mining | Yes | No | No |
wp | Accelerate part of the Weather Research and Forecast Model (WRF) | Simulation | Yes | No | No |