Detailed Information

Register new site in GRID: detailed steps

 

1) sign of Memorandum of Understanding
In order to partecipate to the Italian Grid Infrastructure, the site have to accept several behaviour described in the Memorandum of Understanding (MoU): pdf/doc version. The document have to be signed by the person in charge of the site (Grid Local Coordinator for INFN sites) and send by fax to INFN-CNAF. The Italian Grid MoU for the moment substitute the Service Level Description of EGI Infrastructure

2) site mailing list
Each site have to define a mailing list like grid-prod@<site-domain>, necessary for communication between the Italian Grid Operation Center and the site. This mailing list has to (obviously) include all the site-managers responsible of the site, it will be included in the Italian production sites mailing list and used to notify information, updates and various problems.

3) Registration on GOC-DB
With the help of Italian Grid Operation Center, the site has to register the own resources in a central databased named GOC-DB. The information to fill in the database and to send by email to Italian Grid managers (it-roc-managers <at> lists.infn.it) are the following (the fileds with * are mandatory):

ROC

Country

Timezone

Production Status

Domain: DNS domain used by the machines at this site (eg: cnaf.infn.it)

Short Name * (Generic name for the site: Alphanumeric, dot dash and underscore)

Official Name: Official name (Alphanumeric and basic punctuation), e.g. "Mysite, University of Mytown, Mycountry"

Home URL: Site web homepage if any

GIIS URL: SITE-BDII ldap url, e.g. ldap://sibilla.cnaf.infn.it:2170/mds-vo-name=infn-cnaf,o=grid

IP Range: (a.b.c.d/e.f.g.h), for the case of no firewall: 0.0.0.0/255.255.255.255

Location: An increasing resolution ending with Country (Town, City, Country), e.g. Soho, London, United Kingdom (Alphanumeric and basic punctuation)

Latitude: To get your site latitude: http://itouchmap.com/latlong.html (+/-a.b)

Longitude: To get your site longitudine: http://itouchmap.com/latlong.html (+/-a.b)

Description: (Alphanumeric and basic punctuation)

E-Mail *Generic contact used for broadcasts, notifications and general purpose contact. The mailing list must include all the site-managers responsible for that site. A suggested e-mail alias would take the form of grid-prod@site-domain

Contact Telephone Number *: Generic phone contact (numbers, optional +, dots spaces or dashes)

Emergency Telephone Numberphone contact for emergency procedure (numbers, optional +, dots spaces or dashes)

Security Contact E-mail *: (CSIRT E-Mail) A mailing list for security information communication. The mailing list must be closed and its archives not published. A suggested e-mail alias would take the form of grid-sec@site-domain'

Security Contact Telephone Number * (CSIRT Telephone Number) (numbers, optional +, dots spaces or dashes) (CSIRT = Computer Security Incident Response Team FOR THE SITE)

Alarm E-Mail: (for LCG Tiers 1) (valid email format)

Helpdesk E-Mail: endpoint to the local helpdesk for direct ticketing from GGUS. If not set, the ROC/NGI helpdesk contact will be used instead (valid email format)


4) site-manager registration
when completed this registration, the site-managers can register themselves on GOC-DB, by requesting an account as administrator related to the own site. The site status in this phase is "Candidate" and the monitoring isn't active yet.

5) Updating of the site services list
At this point the site-manager is continuously responsible to keep up-to-dated the site information on the central database: all the services (APEL, CE, CREAM, SITE-BDII, SRM, etc.) provided by the site have to be registered and such a list will have to be updated after every change. If provided, also central services like WMS, BDII, LFC, VOMS, etc.have to be registered on GOC-DB. The overall list of services declared on GOC-DB is used to define the services list that will be monitored. the results of this monitoring are used to compute the monthly availability and reliability level of the site. when the site-manager have inserted this information, the Grid Operation Center is allowed to change the site status in "uncertified" and turn on the monitoring for the preliminary functioning tests: the site will appear in the "test" section of GSTAT monitoring tool (it is already working the new instance GSTAT2.0 that will replace the old one)

6) email contacts
It is necessary to define the following emeail contacts:

mailing list of site CSIRT (Computer Security Incident Response Team), in the form grid-sec@<site-domain>. This mailing list has to contain: (a) at least one site-manager (b) site Security Officer (see next point) and (c) if present, a person in charge of the site computing center (if present such a person usually it isn't a site-manager. This contact is the "CSIRT email" field of GOC-DB.
IMPORTANT: the archives of this mailing list mustn't be publicly accessible; morover the free subscrityon to this mailing list has to be forbidden, otherwise the site won't be certified
- site Security Officer email contact: this person is responsible of security matter related to the site, has to be a site-manager already registerd on GOC-DB and has to make the specific request for the Security Officer role

7) Helpdesk registration
the site-manager have to register themselves in the Italian Helpdesk system and they will manage the tickets assigned to the support group of own site on the helpdesk system

8) site monitoring
The following VOs: infngrid, dteam and ops (necessary for the tests execution) have to be enabled on the site resources. In order to ensure a rapid execution of test jobs, these VO must have a priority greater than the one of other VOs enabled on the site

9) site certification
The Italian Grid operations Center will notify the starting of site certification period by opening a ticket to the site in the Italian helpdesk.
when passed the certification tests with success, the site status will change in "certified" and so the site will be put in the production BDII. At this point the site is monitored continuosly, and the results of monitoring tests is used to compute the site monthly availability/reliability figures: the minimum availability allowed is 70%, while the minimum reliability is 75%. Production sites with availability lower than 50% for three consecutive months will be suspended.

10) accounting
The Italian Grid provides an own accounting infrastructure that allow the accounting data collection and aggregation of the several sites. These data (aggregated and made anonymous in order to not show sensitive information), are then acccessible on the web portal HLRmon (registration is requested). Moreover these data are also sent and centrally collected on the EGEE accounting portal, in which are presents the data of the all the sites belonging to EGI infrastructure.
To be part of the accounting infrastructure, the site has to send the own usage records to a database (HLR) that can be hosted by the site itself. Alternatively, it can be used used for this purpose a "multi-site" HLR (provided by INFN-PADOVA and INFN-CATANIA). By means of a ticket, the following information have to be provided:

- grid queues names, in the format:

   - gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert for lcg-CE

   - cremino.cnaf.infn.it:8443/cream-pbs-cert for CREAM-CE

- non grid queues names, in the format: hostname:queue

- name, surname and personal certificate DN of eache site-manager

- certificate DN of each computing element provided by the site

11) ON MyEGI portal it is possible to display the test results launched on the production sites by the monitoring tool Nagios

12) In case of availability or reliability figures lower than 70% or 75% rescpetively, sites have to provide explanation about that (see here).
In order to mantain a good availability and relaibility level, it is convenient for the site-manager to have a daily look to the test results. If a site is planning a scheduled intervention, or if a problem isn't of rapid solution, it is suggested to declare on GOC-DB a scheduled downtime to avoid a lowering of reliability figure