Auto-tuning A High-level Language Targeted To Gpu Codes

9/22/2020

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration. High-level language targeted to gpu codes. Auto-tuning a high-level language.

Auto-tuning A High-level Language Targeted To Gpu Codes For Computer
Auto-tuning A High-level Language Targeted To Gpu Codes List
Auto-tuning A High-level Language Targeted To Gpu Codes 2017
Auto-tuning A High-level Language Targeted To Gpu Codes For Windows 7

We perform auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and show results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision.
High-level DSLs, like Obsidian, are well suited to take advan-tage of auto-tuning, because of their ability to expose compile-time decisions as ordinary parameters in their meta-language. This prop-erty also opens up an interesting possibility: the language for ex-pressing kernels and the language for expressing auto-tuning can.

README

PolyBench/GPU 1.0: PolyBench Benchmarks on the GPU using CUDA, OpenCL, HMPP, and OpenACC. *

This benchmark suite is partially derived from the PolyBench benchmark suite developed by Louis-NoelPouchet [email protected] and available at http://www.cse.ohio-state.edu/~pouchet/software/polybench/

If using this work, please cite the following paper:Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos.
Auto-tuning a High-Level Language Targeted to GPU Codes.Proceedings of Innovative Parallel Computing(InPar '12), 2012.

Paper available at http://www.eecis.udel.edu/~grauerg/

Available benchmarks:

Convolution:2DCONV3DCONV

Linear Algebra:2MM3MMATAXBICGDOITGENGEMMGESUMMVGRAMSCHMIDTLUMVTSYR2KSYRK

Datamining:CORRELATIONCOVARIANCE

Stencils:ADIFDTD-2DJACOBI-1DJACOBI-2D

The CUDA, OpenCL, and HMPP codes are based on PolyBench 2.0 (with the exception of convolution which isn't part of PolyBench 2.0).The OpenACC codes are based on PolyBench 3.2.

Instructions - to compile/run CUDA, OpenCL, and HMPP (OpenACC described separately below):

CUDA:

Set up PATH and LD_LIBRARY_PATH environment variables to point to CUDA installation
Run 'make' in target folder(s) with codes to generate executable(s)
Run the generated .exe file(s).

OpenCL:

Set up PATH and LD_LIBRARY_PATH environment variables to point to OpenCL installation
Set location of SDK in common.mk file in OpenCL folder
Run 'make' in target folder(s) to generate executable(s)
Run the generated .exe file(s).

HMPP:

Change to bash shell if not currently in it.
Set up PATH and LD_LIBRARY_PATH environment variables to point to CUDA/OpenCL installation
Set up HMPP environment variables with source hmpp-env.sh command in {HMPP_INSTALLATION}/bin folder
Run 'make exe' in target folder(s) with codes to generate executable(s)
Run generated .exe file(s). If there's an error when running the .exe file(s), try running them with the 'make' or 'make run'command in the folder.

Modifying Codes:

Parameters such as the input sizes, data type, and threshold for GPU-CPU output comparison can be modified using constantswithin the codes. After modifying, run 'make clean' then 'make' on relevant code for modifications to take effect in resulting executable.

NOTES ABOUT PARAMETERS:

DATA_TYPE:By default, the DATA_TYPE used in these codes are floats; that can be changed to doubles by changing the DATA_TYPE typedef to 'double'

PERCENT_DIFF_ERROR_THRESHOLD:The PERCENT_DIFF_ERROR_THRESHOLD refers to the percent difference (0.0-100.0) that the GPU and CPU results are allowed to differ and still be considered 'matching';this parameter can be adjusted for each code in the input code file.

OPENACC INFO:

** To compile OpenACC version using HMPP Workbench / CAPS Compiler:

Targeting CUDA:

$> hmpp --codelet-required --openacc-target=CUDA gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

$> capsmc --codelet-required --openacc-target=CUDA gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

Targeting OpenCL:

$> hmpp --codelet-required --openacc-target=OPENCL gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

$> capsmc --codelet-required --openacc-target=OPENCL gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

** To generate the reference output of a benchmark:

Pass the -DPOLYBENCH_DUMP_ARRAYS argument to the host compiler (i.e. gcc or icc) when compiling

$> ./gemm_acc 2>gemm_ref.out

Aug 23, 2011 When your Mac boots up, hold down the Option key until you see the selections, then click your Macintosh HD (or whatever you named it). Is this a legit Windows disk or did you make it yourself? If you made it yourself, it may be a bad burn. Go back to the BootCamp Assistant and delete the partition you made, then try again. If not, try resetting PRAM. Jun 30, 2016 If your copy of Windows 7 came on a DVD, create a disk image of the install disc for use with Boot Camp. Connect a 16 GB or larger USB flash drive that you can erase. Leave this flash drive connected to your Mac until Windows installation is finished. Open Boot Camp Assistant from the Utilities folder (or use Spotlight to find it) and click Continue. Mar 24, 2020 Install the latest macOS updates. Use Disk Utility to repair your startup disk. If Disk Utility found no errors or repaired all errors, try again to use Boot Camp Assistant to install Windows. If Boot Camp Assistant still gives the partitioning error, continue to the steps below. Mac boot camp installer disc cannot be found.

Some available options (OpenACC):

They are all passed as macro definitions during compilation time (e.g,-Dname_of_the_option).

POLYBENCH_TIME: output execution time (gettimeofday) [default: off]
POLYBENCH_NO_FLUSH_CACHE: don't flush the cache before calling thetimer [default: flush the cache]
POLYBENCH_LINUX_FIFO_SCHEDULER: use FIFO real-time scheduler for thekernel execution, the program must be run as root, under linux only,and compiled with -lc [default: off]
POLYBENCH_CACHE_SIZE_KB: cache size to flush, in kB [default: 33MB]
POLYBENCH_STACK_ARRAYS: use stack allocation instead of malloc [default: off]
POLYBENCH_DUMP_ARRAYS: dump all live-out arrays on stderr [default: off]
POLYBENCH_CYCLE_ACCURATE_TIMER: Use Time Stamp Counter to monitorthe execution time of the kernel [default: off]
POLYBENCH_PAPI: turn on papi timing (see below).
MINI_DATASET, SMALL_DATASET, STANDARD_DATASET, LARGE_DATASET,EXTRALARGE_DATASET: set the dataset size to be used[default: STANDARD_DATASET]
POLYBENCH_USE_C99_PROTO: Use standard C99 prototype for the functions.
POLYBENCH_USE_SCALAR_LB: Use scalar loop bounds instead of parametric ones.

PAPI support (OpenACC):

** To compile a benchmark with PAPI support:

$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_PAPI -lpapi -o atax_papi

** To specify which counter(s) to monitor:

Edit utilities/papi_counters.list, and add 1 line per event tomonitor. Each line (including the last one) must finish with a ',' andboth native and standard events are supported.

The whole kernel is run one time per counter (no multiplexing) andthere is no sampling being used for the counter value.

Accurate performance timing (OpenACC):

With kernels that have an execution time in the orders of a few tensof milliseconds, it is critical to validate any performance number byrepeating several times the experiment. A companion script isavailable to perform reasonable performance measurement of a PolyBench.

$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_TIME -o atax_time$> ./utilities/time_benchmark.sh ./atax_time

This script will run five times the benchmark (that must be aPolyBench compiled with -DPOLYBENCH_TIME), eliminate the two extremaltimes, and check that the deviation of the three remaining does notexceed a given threshold, set to 5%.

It is also possible to use POLYBENCH_CYCLE_ACCURATE_TIMER to use theTime Stamp Counter instead of gettimeofday() to monitor the number ofelapsed cycles.

Auto-tuning A High-level Language Targeted To Gpu Codes For Computer

Generating macro-free benchmark suite (OpenACC):

(from the root of the archive:)$> PARGS='-I utilities -DPOLYBENCH_TIME';$> for i in cat utilities/benchmark_list; do create_cpped_version.sh $i '$PARGS'; done

This create for each benchmark file 'xxx.c' a new file'xxx.preproc.c'. The PARGS variable in the above example can be set tothe desired configuration, for instance to create a full C99 version(parametric arrays):

$> PARGS='-I utilities -DPOLYBENCH_USE_C99_PROTO';$> for i in cat utilities/benchmark_list; do ./utilities/create_cpped_version.sh '$i' '$PARGS'; done

Contact Info:

Contacts: Scott Grauer-Gray [email protected] Killian [email protected] Cavazos [email protected]

Paper describing work:

Auto-tuning A High-level Language Targeted To Gpu Codes List

Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos.
Auto-tuning a High-Level Language Targeted to GPU Codes.To Appear In Proceedings of Innovative Parallel Computing(InPar '12), 2012.

Auto-tuning A High-level Language Targeted To Gpu Codes 2017

Codes are based on PolyBench codes which are able to be parallelized on the GPU;Original PolyBench codes available at http://www.cse.ohio-state.edu/~pouchet/software/polybench/.

Auto-tuning A High-level Language Targeted To Gpu Codes For Windows 7

Acknowledgement:This work was funded in part by the U.S. National Science Foundation through the NSFCareer award 0953667 and the Defense Advanced Research Projects Agency through the DARPAComputer Science Study Group (CSSG).

Comments are closed.

Auto-tuning A High-level Language Targeted To Gpu Codes

** To compile OpenACC version using HMPP Workbench / CAPS Compiler:

** To generate the reference output of a benchmark:

** To compile a benchmark with PAPI support:

** To specify which counter(s) to monitor:

Auto-tuning A High-level Language Targeted To Gpu Codes For Computer

Auto-tuning A High-level Language Targeted To Gpu Codes List

Auto-tuning A High-level Language Targeted To Gpu Codes 2017

Auto-tuning A High-level Language Targeted To Gpu Codes For Windows 7

Author

Archives

Categories