GPU_benchmark说明(转)
Introduction
本文内容主要系摘录翻译自Ang Li的博士毕业论文。
1.Perfect
Power Efficiency Revolution for Embedded Computing
| Application Domains | Kernels |
|---|---|
| PERFECT Application 1 | Discrete Wavelet Transform |
| 2D Convolution | |
| Histogram Equalization | |
| Space Time Adaptive Processing | System Solver |
| Inner Product | |
| Outer Product | |
| Synthetic Aperture Radar | Interpolation 1 |
| Interpolation 2 | |
| Back Projection (Non-Fourier SAR) | |
| Wide Area Motion Imaging | Debayer |
| Image Registration | |
| Change Detection | |
| Required Kernels | Sort |
| FFT 1D | |
| FFT 2D |
2. AxBench
A Multiplatform Benchmark Suite for Approximate Computing
One of the goals of AxBench is to provide a diverse set of applications to further facilitate research and development in approximate computing.
http://ieeexplore.ieee.org/abstract/document/7755728/
下载地址
| benchmark | platform | domain | Quality Metric |
|---|---|---|---|
| binarization | GPU | Image Processing | Image Diff |
| blackscholes | CPU, GPU | Finance | Avg. Relative Error |
| brent-kung | ASIC | Arithmetic Computation | Avg. Relative Error |
| canneal | CPU | Optimization | Avg. Relative Error |
| convolution | GPU | Machine Learning | Avg. Relative Error |
| fastwalsh | GPU | Signal Processing | Image Diff |
| fft | CPU | Signal Processing | Avg. Relative Error |
| fir | ASIC | Signal Processing | Avg. Relative Error |
| forwardk2j | CPU, ASIC | Robotics | Avg. Relative Error |
| inversek2j | CPU, GPU, ASIC | Robotics | Avg. Relative Error |
| jmeint | CPU, GPU | 3D Gaming | Miss Rate |
| jpeg | CPU | Image Processing | Image Diff |
| kmeans | CPU, ASIC | Machine Learning | Image Diff |
| kogge-stone | ASIC | Arithmetic Computation | Avg. Relative Error |
| laplacian | GPU | Image Processing | Image Diff |
| meanfilter | GPU | Machine Vision | Image Diff |
| neural network | ASIC | Machine Learning | Avg. Relative Error |
| newton-raph | GPU | Numerical Analysis | Avg. Relative Error |
| sobel | CPU, GPU, ASIC | Image Processing | Image Diff |
| srad | GPU | Medical Imaging | Image Diff |
| wallace-tree | ASIC | Arithmetic Computation | Avg. Relative Error |
3. Rodinia
http://rodinia.cs.virginia.edu/
下载页面:
http://lava.cs.virginia.edu/Rodinia/download_links.htm
| Applications | Dwarves | Domains | Parallel Model | Incre. Ver. |
| Leukocyte | Structured Grid | Medical Imaging | CUDA, OMP, OCL | √ |
| Heart Wall | Structured Grid | Medical Imaging | CUDA, OMP, OCL | |
| MUMmerGPU | Graph Traversal | Bioinformatics | CUDA, OMP | |
| CFD Solver1 | Unstructured Grid | Fluid Dynamics | CUDA, OMP, OCL | |
| LU Decomposition | Dense Linear Algebra | Linear Algebra | CUDA, OMP, OCL | √ |
| HotSpot | Structured Grid | Physics Simulation | CUDA, OMP, OCL | |
| Back Propagation | Unstructured Grid | Pattern Recognition | CUDA, OMP, OCL | |
| Needleman-Wunsch | Dynamic Programming | Bioinformatics | CUDA, OMP, OCL | √ |
| Kmeans | Dense Linear Algebra | Data Mining | CUDA, OMP, OCL | |
| Breadth-First Search1 | Graph Traversal | Graph Algorithms | CUDA, OMP, OCL | |
| SRAD | Structured Grid | Image Processing | CUDA, OMP, OCL | √ |
| Streamcluster1 | Dense Linear Algebra | Data Mining | CUDA, OMP, OCL | |
| Particle Filter | Structured Grid | Medical Imaging | CUDA, OMP, OCL | |
| PathFinder | Dynamic Programming | Grid Traversal | CUDA, OMP, OCL | |
| Gaussian Elimination | Dense Linear Algebra | Linear Algebra | CUDA, OCL | |
| k-Nearest Neighbors | Dense Linear Algebra | Data Mining | CUDA, OMP, OCL | |
| LavaMD2 | N-Body | Molecular Dynamics | CUDA, OMP, OCL | |
| Myocyte | Structured Grid | Biological Simulation | CUDA, OMP, OCL | |
| B+ Tree | Graph Traversal | Search | CUDA, OMP, OCL | |
| GPUDWT | Spectral Method | Image/Video Compression | CUDA, OCL | |
| Hybrid Sort | Sorting | Sorting Algorithms | CUDA, OCL | |
| Hotspot3D | Structured Grid | Physics Simulation | CUDA, OCL, OMP | Hotspot for 3D IC |
| Huffman | Finite State Machine | Lossless data compression | CUDA, OCL |
Ang Li的分类:
| Application | Description | Domain | CUDA | OpenCL | OpenMP |
|---|---|---|---|---|---|
| backprop | Perceptron back propagation | Neural Network | Yes | Yes | Yes |
| bfs | Breadth first search | Graph Algorithm | Yes | Yes | Yes |
| b+tree | B+tree Operation | Searching Yes | Yes | Yes | |
| leukocyte | Detect leukocytes in blood vessel video | Medical Imaging | Yes | Yes | Yes |
| heartwall | Tracks the mouse heart movement by stimulus | Medical Imaging | Yes | No | Yes |
| cfd | Finite volume solver for 3D Euler equations for flow | Fluid Dynamics | Yes | Yes | Yes |
| lud | Calculate the solutions of a set of linear equations | Linear Algebra | Yes | Yes | Yes |
| hotspot | Estimate processor temperature | Physical Simulation | Yes | Yes | Yes |
| nw | Optimization method for DNA sequence alignments | Bioinformatics | Yes | Yes | Yes |
| kmeans | Clustering algorithm | Data Mining | Yes | Yes | Yes |
| srad | Speckle reducing anisotropic diffusion | Image Processing | Yes | Yes | Yes |
| streamcluster | Finds medians to assign points to nearest centers | Data Mining | Yes | Yes | Yes |
| particlefilter | Locate object location based on Noise and path | Medical Imaging | Yes | Yes | Yes |
| pathfinder | Dynamic programming to find a path on a 2D grid Grid | Traversal | Yes | Yes | Yes |
| gaussian | Solving variables in a linear system | Linear Algebra | Yes | Yes | No |
| nn | Find k-nearest neighbors from an unstructured data set | Data Mining | Yes | Yes | Yes |
| lavaMD | Calculate particle potential and relocation in 3D | Molecular Dynamics | Yes | Yes | Yes |
| myocyte | Simulate the behavior of cardiac hear muscle cell | Biological Simulation | Yes | Yes | Yes |
4. Parboil
Parboil强调面向吞吐量的流媒体应用。其中的每个应用都有原生的CUDA应用和优化过的应用。
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| bfs | Breadth-first-search | Graph Algorithm | Yes | Yes | Yes |
| cutcp | Compute Coulombic potential for a 3D grid | Molecular Dynamics | Yes | Yes | Yes |
| histogram | Compute 2D saturating histogram with maximum 256 bins | Data Mining | Yes | Yes | Yes |
| lbm | Fluid dynamics simulation using Lattice-Bolzmann Method | Fluid Dynamics | Yes | Yes | Yes |
| mm | Dense matrix-matrix multiply | Linear Algebra | Yes | Yes | Yes |
| mri-gridding | Compute regular data grid via weighted interpolation | Medical Imaging | Yes | Yes | Yes |
| mir-q | Compute scanner configuration for calibration in 3D MRI | Medical Imaging | Yes | Yes | Yes |
| sad | Sum of absolute differences kernel in MPEG video encoders | Image Processing | Yes | Yes | Yes |
| spmv | Compute the product of a sparse matrix with a dense vector | Linear Algebra | Yes | Yes | Yes |
| stencil | An iterative Jacobi stencil operation on a regular 3D grid | Cellular Automation | Yes | Yes | Yes |
| tpacf | Analyze the spatial distribution of astronomical bodies | Data Mining | Yes | Yes | Yes |
5. Shoc
测量协处理的稳定性和性能,such as GPUs, Xeon-Phi, etc。
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| qtclustering | Group genes into high quality clusters | Bioinformatics | Yes | No | No |
| s3d | Compute chemical reaction rate across a 3D grid | Simulation | Yes | Yes | No |
| scan | Parallel prefix sum of floating point numbers | Data Mining | Yes | Yes | No |
| reduction | Sum reduction operation of floating point numbers | Data Mining | Yes | Yes | No |
| md | Lennard-Jones potential computations | Molecular Dynamics | Yes | Yes | No |
| fft | Fast Fourier transform | Signal Processing | Yes | Yes | No |
| sgemm | Single precision general matrix multiply | Linear Algebra | Yes | Yes | No |
| sort | Fast radix sort program | Data Mining | Yes | Yes | No |
| stencil2d | Standard 2d 9 points stencil calculation | Cellular Automation | Yes | Yes | No |
| bfs | Breadth-first-search | Graph Algorithm | Yes | Yes | No |
| spmv | Sparse matrix vector multiplication | Linear Algebra | Yes | Yes | Yes |
6. Polybench
包含从[非]结构循环嵌套转换的Kernel。这些循环以前用于评估基于多面体模型的优化工具。
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| 2dconv | 2D convolution | Linear Algebra | Yes | Yes | Yes |
| 2mm | 2 matrix multiply | Linear Algebra | Yes | Yes | Yes |
| 3dconv | 3D convolution | Linear Algebra | Yes | Yes | Yes |
| 3mm | 3 matrix multiply | Linear Algebra | Yes | Yes | Yes |
| atax | Matrix transpose and vector multiplication | Linear Algebra | Yes | Yes | Yes |
| bicg | Bicg kernel for BiCGStab linear solver | Linear Algebra | Yes | Yes | Yes |
| corr | Correlation computation | Linear Algebra | Yes | Yes | Yes |
| covar | Covariance computation | Linear Algebra | Yes | Yes | Yes |
| fdtd2d | 2D finite difference time domain kernel | Simulation | Yes | Yes | Yes |
| gemm | matrix multiply | Linear Algebra | Yes | Yes | Yes |
| gesummv | Scalar vector and matrix multiplication | Linear Algebra | Yes | Yes | Yes |
| gramschm | Gram-schmidt process | Linear Algebra | Yes | Yes | Yes |
| mvt | Matrix vector product and transpose | Linear Algebra | Yes | Yes | Yes |
| syr2k | Symmetric rank-2k operations | Linear Algebra | Yes | Yes | Yes |
| syrk | Symmetric rank-k operations | Linear Algebra | Yes | Yes | Yes |
7. Mars
用map reduce实现的data-mining的benchmark。
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| sm | Find the position of a string in a file | Data Mining | Yes | No | No |
| ii | Build inverted index for links in HTML files | Data Mining | Yes | No | No |
| ss | Compute pair-wise similarity score for docs | Data Mining | Yes | No | No |
| mm | Multiply two matrices | Linear Algebra | Yes | No | No |
| pvc | Count distinct page views from web logs | Data Mining | Yes | No | No |
| pvr | Find the top ten hottest pages in the web log | Data Mining | Yes | No | No |
8. Longstar
关注于不规则的应用,主要是数据依赖和拓扑依赖。
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| bfs | Breadth first search | Graph Algorithm | Yes | No | No |
| bh | Simulate the gravitational forces in Barnes-Hut | algorithm Simulation | Yes | No | No |
| dc | Lossless compression upon double-precision FP data | Signal Processing | Yes | No | No |
| dmr | Meshrefinement algorithm from computational geometry | Image Processing | Yes | No | No |
| pta | Andersen’s flow/context-insensitive points-to analysis | Graph Algorithm | Yes | No | No |
| sp | Heuristic SAT-solver based on BaYesian inference | Graph Algorithm | Yes | No | No |
| sssp | Shortest path in a directed graph with weighted edges | Graph Algorithm | Yes | No | No |
| tsp | Traveling salesman problem | Graph Algorithm | Yes | No | No |
9. CUDA SDK
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| bilateralFilter | Edge-preserving non-linear smoothing filter | Image Processing | Yes | Yes | Yes |
| binomialOption | Evaluate option call price using binomial model | Computational Finance | Yes | Yes | Yes |
| BlackScholes | Evaluate option call price using Black-Scholes model | Computational Finance | Yes | Yes | Yes |
| convolutionFFT2D | 2D convolutions using FFT | Image Processing | Yes | Yes | Yes |
| dct8x8 | Discrete cosine transform for blocks of 8 by 8 pixels | Image Processing | Yes | Yes | Yes |
| dxtc | High quality DXT compression | Image Processing | Yes | Yes | Yes |
| dwtHaar1D | 1D discrete Haar wavelet decomposition | Image Processing | Yes | Yes | Yes |
| eigenvalues | Eigenvalues of a tridiagonal symmetric matrix | Linear Algebra | Yes | Yes | Yes |
| fastWalshTransform | Hadamard-ordered Fast Walsh transform | Linear Algebra | Yes | Yes | Yes |
| FDTD3d | Finite differences | time domain progression stencil | Cellular Automation | Yes | Yes |
| grabcutNPP | GrabCut approach using the 8 neighborhood | Graph Algorithm | Yes | Yes | Yes |
| histogram | 64/256 bin histogram | Data Mining | Yes | Yes | Yes |
| imageDenoising | Using KNN and NLM for image denoising | Image Processing | Yes | Yes | Yes |
| lineOfSight | A simple line-of-sight algorithm | Graphic Application | Yes | Yes | Yes |
| Mandelbrot | Mandelbrot or Julia sets interactively | Graphic Application | Yes | Yes | Yes |
| matrixMul | Matrix multiplication | Linear Algebra | Yes | Yes | Yes |
| mergeSortv | Merge Sort algorithm | Data Mining | Yes | Yes | No |
| MersenneTwister | The Mersenne Twister random number generator | Signal Processing | Yes | Yes | Yes |
| MonteCarlo | Evaluate option call price using Monte Carlo approach | Computational Finance | Yes | Yes | Yes |
| nbody | All-pairs gravitational n-body simulation | Simulation | Yes | Yes | Yes |
| oceanFFT | Simulate an Ocean height field | Simulation | Yes | Yes | Yes |
| reduction | Compute the sum of a large arrays of values | Data Mining | Yes | Yes | No |
| scalarProd | Calculate scalar products of input vector pairs | Linear Algebra | Yes | Yes | Yes |
| scan | Parallel prefix sum | Data Mining | Yes | Yes | Yes |
| SobelFilter | Sobel edge detection filter for 8-bit monochrome images | Image Processing | Yes | Yes | Yes |
| SobolQRNG | Sobol Quasirandom Sequence Generator | Computational Finance | Yes | Yes | Yes |
| transpose | Matrix transpose | Linear Algebra | Yes | Yes | Yes |
10. GPGPU-Sim
| Application | Description | Domain | CUDA | OpenCL | C |
|---|---|---|---|---|---|
| aes | AES algorithm in CUDA to encrypt and decrypt files | Cryptography | Yes | No | No |
| dc | A discontinuous Galerkin time-domain solver | Simulation | Yes | No | No |
| lps | 3D Laplace Solver | Computational Finance | Yes | No | No |
| lib | Monte Carlo simulation in London-interbank-offered-rate Model | Computational Finance | Yes | No | No |
| mum | Pairwise local sequence alignment for DNA string | Bioinformatics | Yes | No | No |
| nn | Convolutional neural network to recognize handwritten digits | Machine Learning | Yes | No | No |
| nqu | The N-Queen solver | Simulation | Yes | No | No |
| ray | Ray-tracing (rendering graphics with near photo-realism) | Graphic Application | Yes | No | No |
| sto | Sliding-window implementation of the MD5 algorithm | Data Mining | Yes | No | No |
| wp | Accelerate part of the Weather Research and Forecast Model (WRF) | Simulation | Yes | No | No |