[eside-ghost] cluster beowulf con debian
Iker Castaños Chavarri
hackercasta en esdebian.org
Mar Mar 18 10:30:34 CET 2008
Buenas
He recompilado lam en el home q es compartido por el master a los nodos
atraves de nfs y lo montan los nodos como su home. He generado las llaves y
ya se logean entre ellos atraves de ssh sin pedir contraseña.He introducido
en el PATH el directorio donde esta lam pero ahora no me funciona ya ni el
recon ni el lamboot desde los nodos, me dice lo siguiente:
lam en nodo1:~$ recon -v lamhosts
-bash: /home/lam/bin/recon: Permiso denegado
Los permisos estan ok y el propietario es lam como en todos los pcs.
desde el master se ejecuta pero me dice lo siguiente:
lam en master:~$ recon -v lamhosts
n-1<3128> ssi:boot:base:linear: booting n0 (master)
n-1<3128> ssi:boot:base:linear: booting n1 (nodo1)
ERROR: LAM/MPI unexpectedly received the following on stderr:
bash: tkill: command not found
-----------------------------------------------------------------------------
LAM failed to execute a LAM binary on the remote node "nodo1".
Since LAM was already able to determine your remote shell as "tkill",
it is probable that this is not an authentication problem.
*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.
LAM tried to use the remote agent command "rsh"
to invoke the following command:
rsh nodo1 -n tkill -N -v
This can indicate several things. You should check the following:
- The LAM binaries are in your $PATH
- You can run the LAM binaries
- The $PATH variable is set properly before your
.cshrc/.profile exits
Try to invoke the command listed above manually at a Unix prompt.
You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.
When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
-----------------------------------------------------------------------------
n-1<3128> ssi:boot:base:linear: Failed to boot n1 (nodo1)
n-1<3128> ssi:boot:base:linear: aborted!
Por lo q interpreto me dice que bash no reconoce el tkill y joer claro q se
puede hacer un tkill XD
Despues de esta parrafada que os he enviado si alguno me puede hechar una
mano (y que no sea al cuello jeje) se lo agradeceria mucho.
Un saludo.
------------ próxima parte ------------
Se ha borrado un adjunto en formato HTML...
URL: https://listas.deusto.es/mailman/private/eside-ghost/attachments/20080318/5f46f2ca/attachment-0001.htm
Más información sobre la lista de distribución eside-ghost