Transcript Document
The LHCb Way of Computing The approach to its organisation and development John Harvey CERN/ LHCb DESY Seminar Jan 15th, 2001 Talk Outline Brief introduction to the LHCb experiment Requirements on data rates and cpu capacities Scope and organisation of the LHCb Computing Project Importance of reuse and a unified approach Data processing software Importance of architecture-driven development and software frameworks DAQ system Simplicity and maintainability of the architecture Importance of industrial solutions Experimental Control System Unified approach to controls Use of commercial software Summary J. Harvey : LHCb Computing Slide 2 Overview of LHCb Experiment The LHCb Experiment Special purpose experiment to measure precisely CP asymmetries and rare decays in B-meson systems Operating at the most intensive source of Bu, Bd, Bs and Bc, i.e. the LHC at CERN LHCb plans to run with an average luminosity of 2x1032cm-2s-1 Events dominated by single pp interactions - easy to analyse Detector occupancy is low Radiation damage is reduced High performance trigger based on High pT leptons and hadrons (Level 0) Detached decay vertices (Level 1) Excellent particle identification for charged particles K/p: ~1GeV/c < p < 100GeV/c J. Harvey : LHCb Computing Slide 4 The LHCb Detector At high energies b- and bhadrons are produced in same forward cone Detector is a single-arm spectrometer with one dipole min = ~15 mrad pipe and radiation) max = ~300 mrad optimisation) J. Harvey : LHCb Computing (beam (cost Polar angles of b and b-hadrons calculated using PYTHIA Slide 5 LHCb Detector Layout J. Harvey : LHCb Computing Slide 6 Typical Interesting Event J. Harvey : LHCb Computing Slide 8 The LHCb Collaboration Brazil Spain 49 institutes 513 members Finland Ukraine France UK Switzerland Germany Italy Netherlands PRC Poland Romania Russia LHCb in numbers Expected rate from inelastic p-p collisions is ~15 MHz Total b-hadron production rate is ~75 kHz Branching ratios of interesting channels range between 10-5-10-4 giving interesting physics rate of ~5 Hz Bunch crossing rate 40 MHz Level 0 accept rate 1 MHz Level 1 accept rate 40 kHz Level 2 accept rate 5 kHz Level 3 accept rate Number of Channels 200 Hz 1.1 M Event Size 150 kB Readout Rate 40 kHz Event Building Bandwidth 6 GB/s Data rate to Storage 50 MB/s Total raw data per year 125 TB Total ESD per year 100 TB Simulation data per year 350 TB Level 2/3 CPU 35 kSI95 Reconstruction CPU 50 kSI95 Analysis CPU 10 kSI95 Simulation CPU 500 kSI95 Timescales LHCb experiment approved in September 1998 Construction of each component scheduled to start after approval of corresponding Technical Design Report (TDR) : Magnet, Calorimeter and RICH TDRs submitted in 2000 Trigger and DAQ TDRs expected January 2002 Computing TDR expected December 2002 Expect nominal luminosity (2x1032 cm-2sec –1) soon after LHC turn-on Exploit physics potential from day 1 Smooth operation of the whole data acquisition and data processing chain will be needed very quickly after turn–on Locally tuneable luminosity long physics programme Cope with long life-cycle of ~ 15 years J. Harvey : LHCb Computing Slide 11 LHCb Computing Scope and Organisation Requirements and Resources More stringent requirements … Enormous number of items to control - scalability Inaccessibility of detector and electronics during datataking reliability intense use of software in triggering (Levels 1, 2, 3) - quality many orders of magnitude more data and CPU - performance Experienced manpower very scarce Staffing levels falling Technology evolving very quickly (hardware and software) Rely very heavily on very few experts (1 or 2) - bootstrap approach The problem - a more rigorous approach is needed but this is more manpower intensive and must be undertaken under conditions of dwindling resources J. Harvey : LHCb Computing Slide 13 Importance of Reuse Put extra effort into building high quality components Become more efficient by extracting more use out of these components (reuse) Many obstacles to overcome too broad functionality / lack of flexibility in components proper roles and responsibilities not defined ( e.g. architect ) organisational - reuse requires a broad overview to ensure unified approach we tend to split into separate domains each independently managed cultural don’t trust others to deliver what we need fear of dependency on others fail to share information with others developers fear loss of creativity Reuse is a management activity - need to provide the right organisation to make it happen J. Harvey : LHCb Computing Slide 14 Traditional Project Organisation Online System DAQ Hardware DAQ Software Offline System Detector Control System Simulation Analysis Event Display Detector Description Event Display Message System Detector Description J. Harvey : LHCb Computing Detector Description Message System Slide 15 A Process for reuse Manage Plan, initiate, track, coordinate Set priorities and schedules, resolve conflicts Build Develop architectural models Choose integration standards Engineer reusable components Support Support development Manage & maintain components Validate, classify, distribute Document, give feedback Requirements (Existing software and hardware) J. Harvey : LHCb Computing Assemble Design application Find and specialise components Develop missing components Integrate components Systems Slide 16 LHCb Computing Project Organisation National Computing Board RCRC C Computing Steering Group MM RC M M M Software Development Support Code Mgt, Release Mgt, Tools, Training Documentation Web Manage Simulation M M M C M GAUDI Framework Architecture Spec. Det Desc, Visualisation GEANT4, XML,… Assemble Build Controls Framework Architecture Spec, SCADA, OPC, … Support A DAQ System Timing Fast Control, Readout Unit Event Builder Event Filter Farm Analysis A M M Operations Experiment Control System Detector controls Safety System Run Control system Reconstruction EE A M Trigger Technical Review DAQ Framework Architecture Spec, Simulation Model, TTC, NP, NIC,.. Distributed Computing Facilities CPU Farms Data storage Computing Model Production Tools GRID C Computing Coordinator RC Regional Centre Rep M Project Manager A Software Architect E Project Engineer J. Harvey : LHCb Computing Slide 17 Data Processing Software Software architecture Definition of [software] architecture [1] Set or significant decisions about the organization of the software system Selection of the structural elements and their interfaces which compose the system Their behavior -- collaboration among the structural elements Composition of these structural and behavioral elements into progressively larger subsystems The architectural style that guides this organization The architecture is the blue-print (architecture description document) [1] I. Jacobson, et al. “The Unified Software development Process”, Addison Wesley 1999 J. Harvey : LHCb Computing Slide 19 Software Framework Definition of [software] framework [2,3] A kind of micro-architecture that codifies a particular domain Provides the suitable knobs, slots and tabs that permit clients to customise it for specific applications within a given range of behaviour A framework realizes an architecture A large O-O system is constructed from several cooperating frameworks The framework is real code The framework should be easy to use and should provide a lot of functionality [2] G. Booch, “Object Solutions”, Addison-Wesley 1996 [3] E. Gamma, et al., “Design Patterns”, Addison-Wesley 1995 J. Harvey : LHCb Computing Slide 20 Benefits Having an architecture and a framework: Common vocabulary, better specifications of what needs to be done, better understanding of the system. Low coupling between concurrent developments. Smooth integration. Organization of the development. Robustness, resilient to change (change-tolerant). Fostering code re-use architecture framework applications J. Harvey : LHCb Computing Slide 21 What’s the scope? Each LHC experiment needs a framework to be used in their event data processing applications physics/detector simulation high level triggers reconstruction analysis event display data quality monitoring,… The experiment framework will incorporate other frameworks: persistency, detector description, event simulation, visualization, GUI, etc. J. Harvey : LHCb Computing Slide 22 One main framework Various specialized frameworks: visualization, persistency, interactivity, simulation, etc. A series of basic libraries widely used: STL, CLHEP, etc. J. Harvey : LHCb Computing Analysis Simulation Reconstruction Applications built on top of frameworks and implementing the required physics algorithms. High level triggers Software Structure Frameworks Toolkits Foundation Libraries Slide 23 GAUDI Object Diagram Application Manager Message Service JobOptions Service Converter Converter Converter Event Selector Persistency Service Data Files Detec. Data Service Transient Detector Store Persistency Service Data Files Histogram Service Transient Histogram Store Persistency Service Data Files Event Data Service Transient Event Store Algorithm Algorithm Algorithm Particle Prop. Service Other Services J. Harvey : LHCb Computing Slide 24 GAUDI Architecture: Design Criteria Clear separation between data and algorithms Three basic types of data: event, detector, statistics Clear separation between persistent and transient data Computation-centric architectural style User code encapsulated in few specific places: algorithms and converters All components with well defined interfaces and as generic as possible J. Harvey : LHCb Computing Slide 25 Status Sept 98 – project started GAUDI team assembled Nov 25 ’98 - 1- day architecture review goals, architecture design document, URD, scenarios chair, recorder, architect, external reviewers Feb 8 ’99 - GAUDI first release (v1) first software week with presentations and tutorial sessions plan for second release expand GAUDI team to cover new domains (e.g. analysis toolkits, visualisation) Nov ’00 – GAUDI v6 Nov 00 – BRUNEL v1 New reconstruction program based on GAUDI Supports C++ algorithms (tracking) and wrapped FORTRAN FORTRAN gradually being replaced J. Harvey : LHCb Computing Slide 26 Collaboration with ATLAS Now ATLAS also contributing to the development of GAUDI Open-Source style, expt independent web and release area, Other experiments are also using GAUDI HARP, GLAST, OPERA Since we can not provide all the functionality ourselves, we rely on contributions from others Examples: Scripting interface, data dictionaries, interactive analysis, etc. Encouragement to put more quality into the product Better testing in different environments (platforms, domains,..) Shared long-term maintenance Gaudi developers mailing list tilde-majordom.home.cern.ch/~majordom/news/gaudi-developers/index.html J. Harvey : LHCb Computing Slide 27 Data Acquisition System Trigger/DAQ Architecture LHC-B Detector VDET TRACK ECAL HCAL MUON Data rates RICH 40 MHz Fixed latency 4.0 ms Level 1 Trigger 40 TB/s 1 MHz Level-0 Timing L0 & Fast 40 kHz L1 Control Front-End Electronics 1 MHz Front-End Multiplexers (FEM) Front End Links Variable latency <1 ms RU Throttle 1 TB/s Level-1 RU RU LAN Level 0 Trigger Read-out units (RU) Read-out Network (RN) SFC Variable latency L2 ~10 ms L3 ~200 ms J. Harvey : LHCb Computing Storage SFC 6 GB/s 6 GB/s Sub-Farm Controllers (SFC) CPU CPU CPU CPU Trigger Level 2 & 3 Event Filter Control & Monitoring 50 MB/s Slide 29 Event Building Network Requirements 6 GB/s sustained bandwidth Scalable ~120 inputs (RUs) ~120 outputs (SFCs) commercial and affordable (COTS, Commodity?) Readout Protocol 60x1GbE 60x1GbE Foundry BigIron 15000 Foundry BigIron 15000 3 3 3 3 Foundry BigIron 15000 Foundry BigIron 15000 60x1GbE 60x1GbE 12x10GbE Pure push-through protocol of complete events to one CPU of the farm Destination assignment following identical algorithm in all RUs (belonging to one partition) based on event number Simple hardware and software No central control perfect scalability Full flexibility for high-level trigger algorithms Larger bandwidth needed (+~50%) compared with phased event-building Avoiding buffer overflows via ‘throttle’ to trigger Only static load balancing between RUs and SFCs J. Harvey : LHCb Computing Slide 30 Readout Unit using Network Processors DAQ RU FEM FEM FEM FEM GbE GbE GbE GbE Phy Phy Phy Phy GMII GMII GMII GMII Mem PCI ECS Ethernet IBM NP4GS3 CC-PC Switch Bus Switch Bus IBM NP4GS3 4 x 1Gb full duplex Ethernet MACs 16 RISC processors @ 133 MHz Up-to 64 MB external RAM Used in routers Mem IBM NP4GS3 GMII GMII GMII GMII GMII GMII GMII Phy Phy Phy Phy GbE RU Functions EB and formatting 7.5 msec/event ~200 kHz evt rate RN J. Harvey : LHCb Computing Slide 31 Sub Farm Controller (SFC) Alteon Tigon 2 Dual R4000-class processor running at 88 MHz Up to 2 MB memory GigE MAC+link-level interface PCI interface ~90 kHz event fragments/s Development environment GNU C cross compiler with few special features to support the hardware Source-level remote debugger J. Harvey : LHCb Computing Local Bus PCI Bus Smart NIC CPU ~50 MB/s ~0.5 MB/s PCI Bridge Memory NIC ~50 MB/s ~0.5 MB/s Control NIC ‘Standard’ PC Readout Network (GbE) Subfarm Network (GbE) Controls Network (FEth) Slide 32 Control Interface to Electronics Select a reduced number of solutions to interface Front-end electronics to LHCb’s control system: No radiation (counting room): Ethernet to credit card PC on modules Low level radiation (cavern): 10Mbits/s custom serial LVDS twisted pair SEU immune antifuse based FPGA interface chip High level radiation (inside detectors): CCU control system made for CMS tracker Radiation hard, SEU immune, bypass Provide support (HW and SW) for the integration of the selected solutions J. Harvey : LHCb Computing Ethernet Credit card PC JTAG I 2C Par Master Serial slave JTAG I 2C Par PC Master PC Slide 33 Experiment Control System Control and Monitoring LHC-B Detector VDET TRACK ECAL HCAL MUON Data rates RICH 40 MHz Fixed latency 4.0 ms Level 1 Trigger 40 TB/s 1 MHz Level-0 Timing L0 & Fast 40 kHz L1 Control Front-End Electronics 1 MHz Front-End Multiplexers (FEM) Front End Links Variable latency <1 ms RU Throttle 1 TB/s Level-1 RU RU LAN Level 0 Trigger Read-out units (RU) Read-out Network (RN) SFC Variable latency L2 ~10 ms L3 ~200 ms J. Harvey : LHCb Computing Storage SFC 6 GB/s 6 GB/s Sub-Farm Controllers (SFC) CPU CPU CPU CPU Trigger Level 2 & 3 Event Filter Control & Monitoring 50 MB/s Slide 35 Experimental Control System The Experiment Control System will be used to control and monitor the operational state of the detector, of the data acquisition and of the experimental infrastructure. Detector controls High and Low voltages Crates Cooling and ventilation Gas systems etc. Alarm generation and handling DAQ controls RUN control Setup and configuration of all readout components (FE, Trigger, DAQ, CPU Farm, Trigger algorithms,...) J. Harvey : LHCb Computing Slide 36 System Requirements Common control services across the experiment System configuration services – coherent information in database Distributed information system – control data archival and retrieval Error reporting and alarm handling Data presentation – status displays, trending tools etc. Expert system to assist shift crew Objectives Easy to operate – 2/3 shift crew to run complete experiment Easy to adapt to new conditions and requirements Implies integration of DCS with the control of DAQ and data quality monitoring J. Harvey : LHCb Computing Slide 37 Integrated System – trending charts DAQ Slow Control J. Harvey : LHCb Computing Slide 38 Integrated system – error logger ALEPH error logger, ERRORS + MONITOR + ALARM DAQ Slow Control 2-JUN 2-JUN 2-JUN 2-JUN J. Harvey : LHCb Computing 11:30 11:30 11:30 11:30 ALEP ALEP ALEP TPC R_ALEP_0 TPEBAL TS SLOWCNTR RUNC_DAQ MISS_SOURCE TRIGGERERROR SECTR_VME ALEPH>> DAQ Error TPRP13 <1_missing_Source(s)> Trigger protocol error(TMO_Wait_No_Busy) VME CRATE fault in: SideA Low Slide 39 Scale of the LHCb Control system Parameters Detector Control: O (105) parameters FE electronics: Few parameters x 106 readout channels Trigger & DAQ: O(103) DAQ objects x O(102) parameters Implies a high level description of control components (devices/channels) Infrastructure 100-200 Control PCs Several hundred credit-card PCs. By itself a sizeable network (ethernet) J. Harvey : LHCb Computing Slide 40 LHCb Controls Architecture Conf. DB, Archives, Log files, … Technologies Storage Users Servers Supervision WAN SCADA LAN Other systems (LHC, Safety, ...) ... LAN Controller/ PLC Fieldbus Experimental equipment J. Harvey : LHCb Computing Process Management VME OPC Communication PLC Field Management Fieldbuses Devices Slide 41 Supervisory Control And Data Acquisition Used virtually everywhere in industry including very large and mission critical applications Toolkit including: Development environment Set of basic SCADA functionality (e.g. HMI, Trending, Alarm Handling, Access Control, Logging/Archiving, Scripting, etc.) Networking/redundancy management facilities for distributed applications Flexible & Open Architecture Multiple communication protocols supported Support for major Programmable Logic Controllers (PLCs) but not VME Powerful Application Programming Interface (API) Open Database Connectivity (ODBC) OLE for Process Control (OPC ) J. Harvey : LHCb Computing Slide 42 Benefits/Drawbacks of SCADA Standard framework => homogeneous system Support for large distributed systems Buffering against technology changes, Operating Systems, platforms, etc. Saving of development effort (50-100 man-years) Stability and maturity – available immediately Support and maintenance, including documentation and training Reduction of work for the end users Not tailored exactly to the end application Risk of company going out of business Company’s development of unwanted features Have to pay J. Harvey : LHCb Computing Slide 43 Commercial SCADA system chosen Major evaluation effort technology survey looked at ~150 products PVSS II chosen from an Austrian company (ETM) Device oriented, Linux and NT support The contract foresees: Unlimited usage by members of all institutes participating in LHC experiments 10 years maintenance commitment Training provided by company - to be paid by institutes Licenses available from CERN from October 2000 PVSS II will be the basis for the development of the control systems for all four LHC experiments (Joint COntrols Project) J. Harvey : LHCb Computing Slide 44 Controls Framework LHCb aims to distribute with the SCADA system a framework Reduce to a minimum the work to be performed by the sub-detector teams Ensure work can be easily integrated despite being performed in multiple locations Ensure a consistent and homogeneous DCS Engineering tasks for framework : Definition of system architecture (distribution of functionality) Model standard device behaviour Development of configuration tools Templates, symbols libraries, e.g. power supply, rack, etc. Support for system partitioning (uses FSM) Guidelines on use of colours, fonts, page layout, naming, ... Guidelines for alarm priority levels, access control levels, etc. First Prototype released end 2000 J. Harvey : LHCb Computing Slide 45 Application Architecture ECS DCS Tracke r Vertex H V Tem p H V GA S LHC DAQ Muon H V Tracke r Vertex GA S FE R U FE Muon R U FE R U SAFET Y J. Harvey : LHCb Computing Slide 46 Run Control J. Harvey : LHCb Computing Slide 47 Summary Organisation has important consequences for cohesion, maintainability, manpower needed to build system Architecture driven development maximises common infrastructure and results in systems more resilient to change Software frameworks maximuse level of reuse and simplify distributed development by many application builders Use of industrial components (hardware and software) can reduce development effort significantly DAQ is designed with simplicity and maintainability in mind Maintain a unified approach – e.g. same basic infrastructure for detector controls and DAQ controls J. Harvey : LHCb Computing Slide 48 Extra Slides J. Harvey : LHCb Computing Slide 50 Typical Interesting Event J. Harvey : LHCb Computing Slide 51 J. Harvey : LHCb Computing Slide 52 LHCb Collaboration France: Germany: Italy: Netherlands: Poland: Spain: Switzerland: UK: CERN Brazil: China: Romania: Russia: Ukraine: Clermont-Ferrand, CPPM Marseille, LAL Orsay Tech. Univ. Dresden, KIP Univ. Heidelberg, Phys. Inst. Univ. Heidelberg, MPI Heidelberg, Bologna, Cagliari, Ferrara, Firenze, Frascati, Genova, Milano, Univ. Roma I (La Sapienza), Univ. Roma II(Tor Vergata) NIKHEF Cracow Inst. Nucl. Phys., Warsaw Univ. Univ. Barcelona, Univ. Santiago de Compostela Univ. Lausanne, Univ. Zürich Univ. Bristol, Univ. Cambridge, Univ. Edinburgh, Univ. Glasgow, IC London, Univ. Liverpool, Univ. Oxford, RAL UFRJ IHEP (Beijing), Tsinghua Univ. (Beijing) IFIN-HH Bucharest BINR (Novosibirsk), INR, ITEP,Lebedev Inst., IHEP,PNPI(Gatchina) Inst. Phys. Tech. (Kharkov), Inst. Nucl. Research (Kiev) J. Harvey : LHCb Computing Slide 53 Requirements on Data Rates and Computing Capacities LHCb Technical Design Reports Submitted: January 2000 Recommended by LHCC: March 2000 Approved by RB: April 2000 J. Harvey : LHCb Computing Submitted: September 2000 Recommended: November 2000 Submitted: September 2000 Recommended: November 2000 Slide 55 Defining the architecture Issues to take into account Object persistency User interaction Data visualization Computation Scheduling Run-time type information Plug-and-play facilities Networking Security J. Harvey : LHCb Computing Slide 56 Architectural Styles General categorization of systems [2] user-centric data-centric computation-centric focus on the direct visualization and manipulation of the objects that define a certain domain focus upon preserving the integrity of the persistent objects in a system focus is on the transformation of objects that are interesting to the system Our applications have elements of all three. Which one dominates? J. Harvey : LHCb Computing Slide 57 Getting Started First crucial step was to appoint an architect - ideally skills as: OO mentor, domain specialist, leadership, visionary Started with small design team ~ 6 people, including : developers , librarian, use case analyst Control activities through visibility and self discipline meet regularly - in the beginning every day, now once per week Collect URs and scenarios, use to validate the design Establish the basic design criteria for the overall architecture architectural style, flow of control, specification of interfaces J. Harvey : LHCb Computing Slide 58 Development Process Incremental approach to development new release every few (~ 4) months software workshop timed to coincide with new release Development cycle is user-driven Users define priority of what goes in the next release Ideally they use what is produced and give rapid feedback Frameworks must do a lot and be easy to use Strategic decisions taken following thorough review (~1 /year) Releases accompanied by complete documentation presentations, tutorials URD, reference documents, user guides, examples J. Harvey : LHCb Computing Slide 59 Possible migration strategies C++ Fortran SICb 1 2 3 ? Gaudi Fast translation of Fortran into C++ SICb Gaudi Wrapping Fortran SICb Gaudi Framework development phase J. Harvey : LHCb Computing Transition phase Hybrid phase Consolidation phase Slide 60 How to proceed? Physics Goal: To be able to run new tracking pattern recognition algorithms written in C++ in production with standard FORTRAN algorithms in time to produce useful results for the RICH TDR. Software Goal To allow software developers to become familiar with GAUDI and to encourage the development of new software algorithms in C++. Approach choose strategy 3 start with migration of reconstruction and analysis code simulation will follow later J. Harvey : LHCb Computing Slide 61 New Reconstruction Program - BRUNEL Benefits of the approach A unified development and production environment As soon as C++ algorithms are proven to do the right thing, they can be brought into production in the official reconstruction program Early exposure of all developers to Gaudi framework Increasing functionality of OO ‘DST’ As more and more of the event data become available in Gaudi, it will become more and more attractive to perform analysis with Gaudi A smooth transition to a C++ only reconstruction J. Harvey : LHCb Computing Slide 62 Integrated System - databases SCDevType SCDevType SCDevice The power supply on that VME crate SCChannel VMECrate ModuleType VICCable VMEModule VSBCable Readout System Database SCCrate SCDetector Slow Control Database Detector description J. Harvey : LHCb Computing Slide 63 Frontend Electronics Data Buffering for Level-0 latency Data Buffering for Level-1 latency Digitization and Zero Suppression Front-end Multiplexing onto Front-end links Push of data to next higher stage of the readout (DAQ) J. Harvey : LHCb Computing Slide 64 Timing and Fast Control J. Harvey : LHCb Computing TTCrx L0 trigger L1 TTCrx L1 trigger 17 17 L1 L0 Local trigger (optional) L0 Readout Supervisor Readout Supervisor L0 Throttle switch Readout Supervisor L1 Throttle switch TFC switch L1 trigger system SD1 TTCtx SD2 TTCtx Optical couplers SDn TTCtx Optical couplers Optical couplers L0 TTCtx L1 TTCtx Optical couplers TTC system TTCrx TTCrx L1E L1E ADC ADC ADC ADC ADC ADC FEchip FEchip FEchip L1 buffer FEchip L1 buffer ADC ADC ADC DSP ADC DSP DAQ L0E L0E TTCrx TTCrx L1E L1E FEchip FEchip FEchip FEchip FEchip FEchip ADC ADC ADC ADC ADC ADC FEchip FEchip FEchip L1 buffer FEchip L1 buffer ADC ADC ADC DSP ADC DSP DAQ Throttle OR FEchip FEchip FEchip FEchip FEchip FEchip Control Control TTCrx TTCrx TTCrx TTCrx Throttle OR L0E L0E Control Control Provide common and synchronous clock to all components needing it Provide Level-0 and Level-1 trigger decisions Provide commands synchronous in all components (Resets) Provide Trigger hold-off capabilities in case buffers are getting full Provide support for partitioning (Switches, ORs) Clock fanout BC and BCR LHC clock Slide 65 IBM NP4GS3 Features 4 x 1Gb full duplex Ethernet MACs 16 special purpose RISC processors @ 133 MHz with 2 hw threads each 4 processor (8 threads) share 3 co-processors for special functions Tree search Memory move Etc. Integrated 133 MHz Power PC processor Up-to 64 MB external RAM J. Harvey : LHCb Computing Slide 66 Event Building Network Simulation Simulated technology: Myrinet Nominal 1.28 Gb/s Xon/Xoff flow control Switches: ideal cross-bar 8x8 maximum size (currently) wormhole routing source routing No buffering inside switches Trigger Signal Trigger RU Data Generator RU Data Generator Buffer Buffer NIC NIC Lanai Lanai Throttle Composite Switching Network Lanai Lanai NIC NIC Buffer Buffer Fragment Assembler SFC Fragment Assembler SFC Software used: Ptolemy discrete event framework Realistic traffic patterns variable event sizes event building traffic J. Harvey : LHCb Computing Slide 67 Event Building Activities Tested NIC event-building simulated switching fabric of the size suitable for LHCb Results show that switching network could be implemented (provided buffers are added between levels of switches) Efficiency relative to installed BW Studied Myrinet 60.00% 50.00% 40.00% 30.00% 20.00% Myrinet Simulation 256 kb FIFOs No FIFOs 10.00% 0.00% 8 32 64 96 128 Switch Size Currently focussing on xGb Ethernet Studying smart NICs (-> Niko’s talk) Possible switch configuration for LHCb with ~today’s technology (to be simulated...) Multiple Paths between sources and destinations! J. Harvey : LHCb Computing 60x1GbE 60x1GbE E.g. Foundry BigIron 15000 E.g. Foundry BigIron 15000 3 3 3 3 E.g. Foundry BigIron 15000 E.g. Foundry BigIron 15000 60x1GbE 60x1GbE 12x10GbE Slide 68 Network Simulation Results Efficiency relative to installed BW Results don’t depend strongly on specific technology (Myrinet), but rather on characteristics (flow control, buffering, internal speed, etc) 60.00% Switch Size Fifo Size Switching Levels Efficiency 50.00% 8x8 NA 1 52.5% 32x32 0 2 37.3% 32x32 256 kB 2 51.8% 64x64 0 2 38.5% 64x64 256 kB 2 51.4% 96x96 0 3 27.6% 96x96 256 kB 3 50.7% 128x128 0 3 27.5% 128x128 256 kB 3 51.5% 40.00% 30.00% 20.00% 256 kb FIFOs No FIFOs 10.00% 0.00% 8 32 64 96 128 Switch Size FIFO buffers between switching levels allow to recover scalability 50 % efficiency “Law of nature” for these characteristics J. Harvey : LHCb Computing Slide 69 Alteon Tigon 2 Features Dual R4000-class processor running at 88 MHz Up to 2 MB memory GigE MAC+link-level interface PCI interface Development environment GNU C cross compiler with few special features to support the hardware Source-level remote debugger J. Harvey : LHCb Computing Slide 70 Controls System Common integrated controls system Storage Detector controls DAQ controls LAN CPC ROC Readout system High voltage Low voltage Crates Alarm generation and handling etc. WAN Other systems (LHC, Safety, ...) Master Configuration DB Archives, etc. Logfiles CPC PLC PLC RUN control Setup and configuration of all components (FE, Trigger, DAQ, CPU Farm, Trigger algorithms,...) Consequent and rigorous separation of controls and DAQ path CPC ... CPC PLC CPC PLC Sub-Detectors & Experimental equipment Same system for both functions! By itself sizeable Network! Scale: ~100-200 Control PCs Most likely Ethernet many 100s of Credit-Card PCs J. Harvey : LHCb Computing Slide 71