Mac miniでCUDA 4.0を使う
CUDA 4.0を試してみた。動作環境は下記の通り。
SDKインストール
- 下記のサイトからドライバやツールキット類をダウンロードしてインストール。
CUDA Toolkit 4.0 | NVIDIA Developer
- cudadriver-4.0.19-macos.dmg
- cudatoolkit_4.0.17_macos.pkg
- cudatools_4.0.17_macos.pkg
- gpucomputingsdk_4.0.17_macos.pkg
サンプルソフト
- サンプルをビルドして、デバイス情報を取得してみる。
$ cd /Developer/GPU\ Computing/C/ $ make $ ./bin/darwin/release/deviceQuery [deviceQuery] starting... ./bin/darwin/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Found 1 CUDA Capable device(s) Device 0: "GeForce 320M" CUDA Driver Version / Runtime Version 4.0 / 4.0 CUDA Capability Major/Minor version number: 1.2 Total amount of global memory: 253 MBytes (265027584 bytes) ( 6) Multiprocessors x ( 8) CUDA Cores/MP: 48 CUDA Cores GPU Clock Speed: 0.95 GHz Memory Clock rate: 1064.00 Mhz Memory Bus Width: 128-bit Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Concurrent copy and execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: Yes Support host page-locked memory mapping: Yes Concurrent kernel execution: No Alignment requirement for Surfaces: Yes Device has ECC support enabled: No Device is using TCC driver mode: No Device supports Unified Addressing (UVA): No Device PCI Bus ID / PCI location ID: 5 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce 320M [deviceQuery] test results... PASSED Press ENTER to exit...
-
- 小さいマシンながら48コアのGPUが可能らしい。
- いつも通り、まずは基本のメモリ転送のパフォーマンスをチェックしてみる。まずは32bitにてテスト。
$ make $ file ./bin/darwin/release/bandwidthTest ./bin/darwin/release/bandwidthTest: Mach-O executable i386 $ ./bin/darwin/release/bandwidthTest [bandwidthTest] starting... ./bin/darwin/release/bandwidthTest Starting... Running on... Device 0: GeForce 320M Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 597.0 Device to Host Bandwidth, 1 Device(s), Paged memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 618.4 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6136.2 [bandwidthTest] test results... PASSED Press ENTER to exit...
- 次に64bitでビルドして実行してみる。
$ make x86_64=1 $ file ./bin/darwin/release/bandwidthTest ./bin/darwin/release/bandwidthTest: Mach-O 64-bit executable x86_64 $ ./bin/darwin/release/bandwidthTest [bandwidthTest] starting... ./bin/darwin/release/bandwidthTest Starting... Running on... Device 0: GeForce 320M Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 597.2 Device to Host Bandwidth, 1 Device(s), Paged memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 751.1 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6126.6 [bandwidthTest] test results... PASSED Press ENTER to exit...
-
- "Device to Host Bandwidth"のみ32bitと64bitでは差違がつく点は面白い。
- 次はメモリ固定で実行してみる。(64bitで実行)
$ ./bin/darwin/release/bandwidthTest --memory=pinned [bandwidthTest] starting... ./bin/darwin/release/bandwidthTest Starting... Running on... Device 0: GeForce 320M Quick Mode Host to Device Bandwidth, 1 Device(s), Pinned memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1121.0 Device to Host Bandwidth, 1 Device(s), Pinned memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3119.9 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6132.9 [bandwidthTest] test results... PASSED
-
- ハードウェアやCUDAのドライバが前回とは異なるので単純には比較は出来ないけれど、(非力とは言え)デスクトップマシンなのに、MacBook Airの半分程度の能力しか出ないとは予想外の結果だった。
関連