Scientific Linux 6 に CUDAをインストールした(多分CentOS6も同様)

Scientific Linux 6 に CUDAを入れる。
Scientific Linuxは、むやみにアップデートすると、nVidiaが動かなくなったりする。
その時は、デバイス・ドライバを再コンパイルする。

今回は、Scientific Linuxの下記の版での話(これ以前も、同様でうまく行っている)
Linux 2.6.32-220.7.1.el6.x86_64

--
インストールした、nVidiaのファイルは、次の3つ。

NVIDIA-Linux-x86_64-295.20.run
cudatoolkit_4.1.28_linux_64_rhel6.x.run
gpucomputingsdk_4.1.28_linux.run

--

Scientific Linux 6.2 や、CentOS 6.2は、nVidiaの GPUが入っている機械では、インストール・ディスクさえ起動しないことがある。
私は、それにはまった。
私のマシンは、マザーボード上のチップセットに、Intelのそこそこのビデオが載っている。
それにnVidia GTS450を差している。

Web上には色々情報があるが…

もっとも手っ取り早いのは…
1) nVidiaのカードを外して、Scientific Linuxをインストール。
2) nouveau.ko を消す。
3) nVidiaのカードを差す。
である。

一応、システムを起動させる程度なら、
boot時に kernel 起動のオプションに、 nouveau.modeset=0 を加えて、
nouveauとかいうモジュールを切ればよい。
(恒久的に行うなら、/boot/grub.conf の kenel のエントリのオプションの最後に nouveau.modeset=0 を付加する)

しかし、それだけでは、X Windowサーバの起動時に、nouveauモジュールが読み込まれて、Xサーバが正常起動せず、GUIが使えない。
(Linuxのインストールとネットワーク設定が終了していれば、他の機械からsshなどでリモート・ログインできるので、なんとでもできるが)

Xのために、black.listに書くとか、色々、禁止する方法はあるようなのだが、そもそもnouveauって要るのか???
というわけで、私は、 nouveau.ko を消した。
(実際には、まったく遠くのディレクトリにmvした。後日、必要なことがわかった時、戻せるように)

また、nVidiaのドライバのインストールは、Xサーバが起動していると、無理なので、他の機械からsshでログインして、インストール作業を行った方が楽である。

以下、nVidiaのソフトウェアのインストール手順
(プロンプト '#' は、rootでの実行。プロンプト '%' は、一般ユーザでの実行(csh系が好きなので))

1. nouveau.ko を消す。

2. # reboot

普通にシステムが起動。
3. ログインして、rootになる。

4. # init 2
(run levelを変更。X window serverが終わる。
X windowのコンソールで作業をしていたら、いきなり終わる。その場合は、キャラクタ・コンソールでログインしなおす。
他の機械からリモート・ログインしていれば、楽)

5. # sh NVIDIA-Linux-x86_64-290.10.run

色々、聞かれるが、全部、従う方向で。

キャラクタ画面で動作。
ライセンス同意を求められるので、tabキーで「accept」を選んで[return]キーを押下。

OpenGL 32bit ライブラリを入れるか、聞かれるので、選択。
私は、それを入れたが、衝突もなく、無事に通過。

/etc/X11/xorg.conf が存在したら、「xorg.confを修正するか」と聞かれる。
修正するを選択。

NVIDIA-Linux-x86_64-290.10.runのインストールは、正常に終了するだろう。
しかし、ビデオ・カードが2枚以上刺さっている機械では、nvidiaのカードでXサーバが起動しないことが多い。
(私の場合、Xサーバは起動しなかった)

6. /etc/X11/xorg.conf を修正し、Xサーバに、PCIバス上の位置を教える
6.1

# lspci | grep VGA などとすれば、PCIバス上のデバイスが判る


# lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
01:00.0 VGA compatible controller: nVidia Corporation Device 1245 (rev a1)

6.2 /etc/X11/xorg/confを編集し、nVidiaに対応する Device のセクションで BusID指定する。
以下のように書く。(nVidiaが PCI:1:0.0 にあるとして)


--
Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID "PCI:1:0:0" 
EndSection
--

6.3 # X & とやって、Xサーバが起動することを、確認する。
X11の起動ログは、
/var/log/Xorg.0.log
を見る。
ダメな時は、上記のログをよく読んで、原因を考える。

7. # init 5
うまく X serverが起動すればOK。
色々、気になる人は、reboot 。

-- X Windowがうまく動いたら

8. # sh cudatoolkit_4.1.28_linux_64_rhel6.x.run

/usr/local/cuda に、cudaの開発キットが入る。
cudaコンパイラは、/usr/local/cuda/bin/nvcc 。

9. 一般ユーザに戻り、
% sh gpucomputingsdk_4.1.28_linux.run

~/NVIDIA_GPU_Computing_SDK/ に、サンプル・プログラムなどが展開される。

--ここで一回休憩 --

この辺で、
システムのメニュー・バーをいじる。
「システム」-> 「設定」メニュー中に
"NVIDIA X Server Sttings" が増えている。
起動すると、おお、なんか、カッコエエでぇ～♪

--

さて、プログラムがコンパイルできるかどうかを試したい。

10. ~/NVIDIA_GPU_Computing_SDK/ の、サンプル・プログラムなどをmakeする。
% cd ~/NVIDIA_GPU_Computing_SDK/
% make
おもむろにmakeを実行する(爆笑)。
makeは成功する気がするような、実は、色々、やったかも…
(記憶がエエ加減…)

makeが成功しない時は、
・/usr/local/cuda/lib* の中の共有ライブラリの symbolic linkが足りないかも知れない
手で、symbolic linkを張る
・X11develop, glut, mesaGLdevel とか、3Dグラフィックスに興味ない人は入れない開発ライブラリが必要
# yum install mesa-libGL-devel.x86_64
とか、必要なライブラリが入るまで、やる。

11. 上記がうまくいけば、

~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/ の中に、わんさかと実行ファイルができている。
バッチで実行できる。

~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery は、実用性のあるコマンド。

12. ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQueryを実行する。


  % ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery
[deviceQuery] starting...

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce GTS 450"
  CUDA Driver Version / Runtime Version          4.2 / 4.1
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 1024 MBytes (1073414144 bytes)
  ( 4) Multiprocessors x (48) CUDA Cores/MP:     192 CUDA Cores
  GPU Clock Speed:                               1.57 GHz
  Memory Clock rate:                             1804.00 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.1, NumDevs = 1, Device = GeForce GTS 450
[deviceQuery] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

と、デバイスの情報が見れる。

ここまでできれば、インストールはうまくいっているだろう。

--

ちょっとぐらい派手なデモ。OpenGLやらGLutを入れさせられたので、その成果も見たい。

% ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/postProcessGL

CUDAで、なんやらポスト処理を行っている。
マウスの右クリックでメニューが出る。

---

トイ・プログラムを作ってコンパイルする。

私は、.cshrc の中に、


 setenv LD_LIBRARY_PATH  '/usr/local/cuda/lib:/usr/local/cuda/lib64'

などと書いている。必要ないかも知れないが、まぁ、書いている。(CUDAに関しては、ldconfigはしない方向なので)


-- Makefile


CUDA_INSTALL_PATH = /usr/local/cuda

NVCC       := $(CUDA_INSTALL_PATH)/bin/nvcc 

all: aa dd-sync dd-nosync dd-simple

dd-sync: dd.cu
	$(NVCC) -o dd-sync dd.cu -DSYNC_SYNC

dd-nosync: dd.cu
	$(NVCC) -o dd-nosync dd.cu


dd-simple: dd-simple.cu
	$(NVCC) -o dd-simple dd-simple.cu -DSYNC_SYNC


aa: aa.cu
	$(NVCC) -o aa  aa.cu

bb: bb.cu
	$(NVCC) -o bb  bb.cu
--


-- dd.cu --

#include
 
#define XXX 800000000

const int N = 192;
 
__global__ void VecAdd(float* A, float* B)
{
   int i = threadIdx.x;
   B[i] = A[i] * 2.0f;
}
 
int main(int argc, char* argv[])
{
	int count;
	float A[N];
	float B[N];
	float *gA;
	float *gB;
	int i;
 
	// No. of devices

	cudaGetDeviceCount(&count);
	printf("deviceCount=%d\n",count);

	printf("loop=%d\n",XXX);

 
	// malloc on GPU
	cudaMalloc((void**)&gA, N*sizeof(float));
	cudaMalloc((void**)&gB, N*sizeof(float));
 
	// initialize array on host 
	for(i=0;i<N;i++){
		A[i] = (float)i;
	}
 
	// host -> GPU
	cudaMemcpy( gA, A, N*sizeof(float), cudaMemcpyHostToDevice);
 
	// VecAdd on GPU (1 dimension 、N thread)
	VecAdd<<<1, N>>>(gA,gB);

 	// result GPU -> host 
	cudaMemcpy( B, gB, N*sizeof(float), cudaMemcpyDeviceToHost);
 
	for(i=0;i<N;i++){
		printf("%d:%f,%f\n",i,A[i],B[i]);
	}
 
}
--

を make したら、動く。

-- 一応、以上

今回のシステム

--
% uname -a
Linux hinano 2.6.32-220.7.1.el6.x86_64 #1 SMP Tue Mar 6 15:45:33 CST 2012 x86_64 x86_64 x86_64 GNU/Linux

--
% gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
コンフィグオプション: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
スレッドモデル: posix
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)

--
ハードウェア

nvidia GTS450

-CPU
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping : 7
cpu MHz : 1600.000
cache size : 6144 KB

たけおかぼちぼち日記

思いついたらメモ

Scientific Linux 6 に CUDAをインストールした(多分CentOS6も同様)