Best Practices

Firewall Configuration for Grid Services

The recommended setup for the top level router or for the local firewalls is to allow outbound connectivity and to deny all inbound connections by default.

Depending on the type of grid element, a known set of ports must be opened in order to provide the correct service access from the internet.

The following wiki pages have been prepared by the EGI-CSIRT, in collaboration with the services developers, with the explanation of the TCP/UDP ports needed by the various grid services and many other useful notes for securying the nodes:

Other information (perhaps a little more outdated) can be found also at the following URL:

Collecting job information on a computing element

Suppose the following scenario: you receive an advisory saying that suspicious activity has been registered from a given IP address.

You want to investigate whether this incident has possible implication also at your site, and in particular on your Grid farm. In this case you also want to take actions to prevent further problems for you and possibly for the whole grid infrastructure.

Step 1 : Find out whether that IP has shown up at your site.

  • This can be done by looking at the appropriate logs on the router, natting devices etc..
  • In case the connection is still active, netstat or lsof can reveal it (though it may be boring to look inside every machine of the farm)

Step 2 : Once the connection has been found, some analysis can be made on the active process and on the executable which is running, but this goes beyond the scope of this section.

For the moment, let's suppose that the process holding the connection is owned by a user from the pool accounts. This means that it is running on a worker node and that it comes from a Grid job which has been submitted to your site, so let's look at which information can be gathered.

Step 3 : First of all, let's dig into the home directory on the worker node.

CREAM CE

  • You should find the base directory of the running job, something of the form: home_cream_XXYYYYYYY, together with some error and output files.
  • Inside that directory, you will find the the CREAM job wrapper and another directory with the name of the CREAM JOB-ID, of the form CREAMXXYYYYYYY,
    • The CREAM job wrapper contains a lot of information on the job like:
      • The Grid Job ID
      • The executable and the command line
      • The files which have been transferred on the WN (input_file_dest[x]), together with the source host (input_transfer_command[x])
      • The files that are supposed to be transferred back at the end of the job (output_file_dest[x]), together with the destination host (output_transfer_command[x])
    • Inside the CREAMXXYYYYYYY directory you should find:
      • Executable and scripts
      • Input and output files

LCG CE

  •  

Step 4 : Now, let's go on the Computing Element and find out some other useful information...

CREAM CE

  • The main log file on a CREAM CE is: /opt/glite/var/log/glite-ce-crem.log. Now that we have the CREAM job ID (CREAMXXYYYYYYY) we can grep for that string inside the log file and find:
    • Date and time of the submission
    • DN of the submitting user
    • Grid job ID
    • WMS or UI used for the submission (if the UI is not present, it is always possible to ask the adminitrators of the WMS for further information given the Grid job ID)
    • Batch system job ID and Worker Node

Now that we have information about the user and the WMS/UI, we can try to limit the possibility for the user to access our resources.

  • First of all, you should ban the user on CE (procedure for CREAM) and SE (procedure for STORM)

LCG CE

  • Log files where to look for useful information are
    • /var/log/messages
    • /var/log/globus-gatekeeper.log
    • /var/log/globus-gridftp.log

/var/log/messages contains information to get IP for job vector (WMS/UI), grid and lrms job id.

Following some basic grep to spot forensic info

# grep -B 5 '192.168.66.1' /var/log/messages
May 19 10:53:13 myce-host GRAM gatekeeper[10892]: Authorized as local user: pilinfngrid002
May 19 10:53:13 myce-host GRAM gatekeeper[10892]: Authorized as local uid: 7402
May 19 10:53:13 myce-host GRAM gatekeeper[10892]:           and local gid: 2604
May 19 10:53:13 myce-host GRAM gatekeeper[10892]: "/C=IT/O=Organization/OU=Personal Certificate/L=Dep/CN=name surname" mapped to pilinfngrid002 (7402/2604)
May 19 10:53:13 myce-host GRAM gatekeeper[10892]: JMA 2011/05/19 10:53:13 GATEKEEPER_JM_ID 2011-05-19.10:53:13.0000010892.0000000000 has EDG_WL_JOBID 'https://wms.mydomain:9000/jN_w94CRcwp67aoc1jE62w'
May 19 10:53:17 myce-host GRAM gatekeeper[10923]: Got connection 192.168.66.1 at Thu May 19 10:53:17 2011


# grep -B 5 "192.168.66.1" /var/log/messages |grep JOBID |grep https |grep '10:53'
May 19 10:53:13 myce-host GRAM gatekeeper[10892]: JMA 2011/05/19 10:53:13 GATEKEEPER_JM_ID 2011-05-19.10:53:13.0000010892.0000000000 has EDG_WL_JOBID 'https://wms.mydomain:9000/jN_w94CRcwp67aoc1jE62w'

# grep '2011-05-19.10:53:13.0000010892' /var/log/messages |grep GRAM_SCRIPT_JOB_ID
May 19 10:53:23 gridit-ce-001 gridinfo[10896]: JMA 2011/05/19 10:53:23 GATEKEEPER_JM_ID 2011-05-19.10:53:13.0000010892.0000000000 has GRAM_SCRIPT_JOB_ID 1305795203:lcgpbs:internal_4108231704:10896.1305795193 manager type lcgpbs

# grep 1305795203:lcgpbs /var/log/messages |grep 'batch'
May 19 10:56:09 gridit-ce-001 gridinfo: [29894-12987] Submitted job 1305795203:lcgpbs:internal_4108231704:10896.1305795193 to batch system lcgpbs with ID 365289.gridit-ce-001.cnaf.infn.it


/var/log/globus-gatekeeper.log contains almoust the same info information logged in syslog

# grep -A 5 '192.168.66.1' /var/log/globus-gatekeeper.log
Mapping service "LCMAPS" returned local user "pilinfngrid002"
 PID: 18611 -- Notice: 5: Authorized as local user: pilinfngrid002
 PID: 18611 -- Notice: 5: "/C=IT/O=Organization/OU=Personal Certificate/L=Dep/CN=name surname" mapped to pilinfngrid002 (7402/2604)
JMA 2011/05/17 17:20:16 GATEKEEPER_JM_ID 2011-05-17.17:20:10.0000018533.0000000000 mapped to pilinfngrid002 (7402, 2604)
JMA 2011/05/17 17:20:16 GATEKEEPER_JM_ID 2011-05-17.17:20:15.0000018611.0000000000 mapped to pilinfngrid002 (7402, 2604)


/var/log/globus-gridftp.log can be queried to get the gsiftp transfer to the wn where the job is executed
# grep -A 2 pilinfngrid002 /var/log/gridftp-session.log
[13166] Thu May 19 10:56:11 2011 :: User pilinfngrid002 successfully authorized.
[13166] Thu May 19 10:56:11 2011 :: Starting to transfer "/home/pilinfngrid002/.lcgjm/globus-cache-export.a13003/cache_export_dir.tar".
[13166] Thu May 19 10:56:11 2011 :: Finished transferring "/home/pilinfngrid002/.lcgjm/globus-cache-export.a13003/cache_export_dir.tar".
[13166] Thu May 19 10:56:11 2011 :: Closed connection from mywn.mydomain:34778
[13167] Thu May 19 10:56:10 2011 :: Server started in inetd mode.
--
[13185] Thu May 19 10:56:12 2011 :: User pilinfngrid002 successfully authorized.
[13185] Thu May 19 10:56:12 2011 :: Starting to transfer "/home/pilinfngrid002/.globus/.gass_cache/local/md5/2f/8bfd103a6730f021d37d1eb5e6d9fe/md5/cd/9fff555f63d9b5b6fb378485004b87/data".
[13185] Thu May 19 10:56:12 2011 :: Finished transferring "/home/pilinfngrid002/.globus/.gass_cache/local/md5/2f/8bfd103a6730f021d37d1eb5e6d9fe/md5/cd/9fff555f63d9b5b6fb378485004b87/data".
[13185] Thu May 19 10:56:12 2011 :: Closed connection from mywn.mydomain:34791
[13188] Thu May 19 10:56:11 2011 :: Server started in inetd mode.


Job sand box has been transferred to WN mywn.mydomain, directory of the running job with the executable is
/home/pilinfngrid002/.globus/.gass_cache/local/md5/2f/8bfd103a6730f021d37d1eb5e6d9fe/md5/cd/9fff555f63d9b5b6fb378485004b87/data
Once you got information on user DN and its local mapping, you can query your batch system to have more detailed info on the job that has been executing by the mapped local user.

In case you need to ban the user DN, please follow the procedure for lcg-CE