Document 7843457
Download
Report
Transcript Document 7843457
Networking for ATLAS Remote Farms
Richard Hughes-Jones
The University of Manchester
DataGrid WP7 – Dante Tests on the GÉANT Core
End-2-End Measurements from the 4th Year VLBI Project at Manchester
New TCP stacks – the effect on throughput
Some Simple Network Tests CERN-Manchester
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
DataGrid WP7 – Dante Tests on the GÉANT Core
Set-up
Supermicro PC in:
London GEANT PoP
Amsterdam GEANT PoP
Smartbits in:
London GEANT PoP
Frankfurt GEANT PoP
Long link
UK-SE-DE2-IT-CH-FR-BE-NL
Short Link
UK-FR-BE-NL
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Tests GÉANT Core: UDP throughput
UDP Throughput
London-Amsterdam
Available BW to packet on wire
Then 1/t
Wire rate 998 Mbit/s
for packets > 1400 bytes
uk-nl_20tg4-hs-w100_01Oct03
1000
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
900
Recv Wire rate Mbits/s
800
700
600
500
400
300
200
100
0
0
Dips in BW lined to packet loss
SysKonnect NIC int. per packet
CPU load important
10
15
20
Spacing between frames us
25
30
35
40
100
90
80
70
60
50
40
30
20
10
0
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
% Packet loss
Packet Loss
None for large packets
5
0
5
10
15
20
Spacing between frames us
25
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
30
35
40
Effect of Packet size
30
Packet re-order uk-nl 10,000 BE sent wait 10 us 01 Oct 03
20
30
London-Amsterdam
Packets at 10 µs – line speed
10,000 sent
Packet Loss ~ 0.1%
25
Out of order %
Out of order %
Tests GÉANT Core: Packet re-ordering
20
15
10
0
1400
1401
1402
1403
Packet size bytes
1404
10
5
0
0
Re-order Distribution
Packet re-order uk-nl 10,000 sent wait 10 us
500
No. Packets
500
1000
Packet size bytes
400
1400 bytes
1401 bytes
1402 bytes
300
200
100
0
0
1
2 Length
3
4out-of-order
5
6
7
8
9
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
1500
Effect of LBE background
Amsterdam-London
BE Test flow
Packets at 10 µs – line speed
10,000 sent
Packet Loss ~ 0.1%
% Out of order
Tests GÉANT Core: Packet re-ordering
UDP 1472 bytes NL-UK-lbexxx_7nov03
20
18
16
14
12
10
8
6
4
2
0
hstcp
Standard TCP
line speed
90% line speed
2
2.2
2.4
2.6
Total Offered Rate Gbit/s
2.8
3
3.2
200000
180000
160000
140000
120000
100000
80000
60000
40000
20000
0
Packet re-order 1472 bytes uk-nl 21 Oct 03 10,000 sent wait 10 us
0 5000
% lbe
Packet re-order 1400 bytes uk-nl 21 Oct 03 10,000 sent wait 10 us
104500
% lbe
4000
20 % lbe
3500
30 % lbe
3000
402500
% lbe
502000
% lbe
1500
60 % lbe
701000
% lbe
500
80 % lbe
0
1
2
3
4
5
6
Length out-of-order
7
8
9
0 % lbe
10 % lbe
20 % lbe
No. Packets
No. Packets
Re-order Distributions:
30 % lbe
40 % lbe
50 % lbe
60 % lbe
70 % lbe
80 % lbe
1
2
3
4
5
6
Length out-of-order
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
7
8
9
Tests GÉANT Core: Packet Jitter
Amsterdam-London
BE Test flow
Packet spacing 80 µs
BE Test flow + Background:
60% BE 1.4Gbit + 40% LBE 780Mbit
Flow: BE Background: none
50000
35000
40000
30000
30000
Frequency
Frequency
Flow:BE Background: 60% BE 1.4Gbit + 40% LBE 780Mbit
40000
20000
10000
25000
20000
15000
10000
5000
0
0
20
40
60
80
100
120
0
140
0
20
40
Latency us
Flow:IPP Background: none
120
140
flow:IPP Background: 60% BE 1.4Gbit + 40% LBE 780Mbit
60000
50000
1-way latency us
200000
Frequency
100
IPPremium Test flow + Background
IPPremium Test flow
250000
60
80
Packet Jitter us
150000
100000
50000
40000
30000
20000
10000
0
0
0
20
40
60
80
Packet Jitter us
100
120
140
0
20
40
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
60
80
Packet Jitter us
100
120
140
Tests GÉANT Core: 1-way Delay
Flow:IPP Background: none
11340
1-way latency us
Amsterdam-London
IPPremium Test flow
Packet spacing 80 µs
11320
11300
11280
11260
11240
11220
BE Test flow + Background:
60% BE 1.4Gbit +
40% LBE 780Mbit
1-way latency us
0
11700
11650
11600
11550
11500
11450
11400
11350
11300
11250
11200
1-way latency us
0
BE Test flow + Background:
60% BE 1.4Gbit +
40% LBE 780Mbit
2000
4000
6000
8000
10000
No. + 40% LBE 780Mbit
Flow:IPP Background: 60%Packet
BE 1.4Gbit
2000
4000
6000
8000
10000
No. + 40% LBE 780Mbit
Flow:BE Background: 60% Packet
BE 1.4Gbit
12800
12600
12400
12200
12000
11800
11600
11400
11200
11000
0
2000
4000
6000
Packet No.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
8000
10000
VLBI Project: Test Topology
SURFnet
Manchester
JIVE
Dwingaloo
Jodrell
SuperJANET4
Adam Mathews
Steve O’Toole
Univ of Manchester
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
VLBI Project: Throughput
Dwingeloo
1.2 GHz PIII
Gnt5-DwMk5 11Nov03/DwMk5-Gnt5 13Nov03-1472bytes
1200
1000
Recv Wire rate Mbits/s
Manchester to
2.0G Hz Xeon
Gnt5-DwMk5
DwMk5-Gnt5
800
600
400
200
0
Re-ordering vs Offered Load
0
5
12
10
15
20
25
Spacing
between
frames
us
Gnt5-DwMk5 11Nov03-1472 bytes
30
35
40
% Packet loss
10
8
Gnt5-DwMk5
DwMk5-Gnt5
6
4
2
0
0
5
10
15
20
Spacing between frames us
25
30
35
40
Gnt5-DwMk5 11Nov03 1472 bytes
% Kernel
Sender
100
80
60
40
20
0
0
5
10
15
20
Spacing between frames us
25
30
35
40
Gnt5-DwMk5 11Nov03 1472 bytes
% Kernel
Receiver
100
80
60
40
20
0
0
5
10
15
20
Meeting on ATLAS Remote
Farms.
Copenhagen
11 May
Spacing between
frames2004
us
R. Hughes-Jones Manchester
25
30
35
40
VLBI Project: Jitter & 1-way Delay
1472 byte Packets man -> JIVE
FWHM 22 µs (B2B 3 µs )
10000
10000
1472 bytes w=50 jitter Gnt5-DwMk5 28Oct03
1000
6000
N(t)
N(t)
8000
1472 bytes w=50 jitter Gnt5-DwMk5 28Oct03
4000
100
10
2000
0
1
0
20
40
60
80
100
120
140
0
20
40
60
Jitter us
80
100
120
140
Jitter us
1-way Delay – note the packet loss (points with 0 –way delay)
1472 bytes w12 Gnt5-DwMk5 21Oct03
12000
10000
10000
8000
8000
1-way delay us
1-way delay us
1472 bytes w12 Gnt5-DwMk5 21Oct03
12000
6000
4000
2000
6000
4000
2000
0
0
0
1000
2000
3000
Packet No.
40002000
210050002200
2300
2400
2500
2600
Packet No.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
2700
2800
2900
3000
Aggregated Variance Method
Divide time series length N into
blocks of size m
Calc mean of each section Xm(k)
k= 1 … N/m
Calc variance VXm of these Xm(k)
Vary m size of the blocks
Plot on log-log & fit slope β
Hurst parameter H
Aggrigate-variance Log10( X(m) )
VLBI Project: Packet Loss – Long Range Effects?
4
3.5
3
y = -0.355x + 2.8826
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
sub-sample size Log10( m )
β = 2H -2
Measure:
β = -0.355 which gives H 0.822
H =1 no long range dependence
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
2.5
3
Traffic Flows
Manchester – NetNorthWest - SuperJANET Access links
Two 1 Gbit/s
Access links:
SJ4 to GÉANT
GÉANT to SurfNet
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
High Performance TCP – DataTAG
Different TCP stacks tested on the DataTAG Network
128 ms round trip time
Drop 1 in 106
High-Speed
Rapid recovery
Scalable
Very fast recovery
Standard
Recovery would
take ~ 10 mins
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
High Performance TCP – MB-NG
Drop 1 in 25,000
Rtt 6.2 ms
Recover in 1.6 s
Standard
HighSpeed
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Scalable
Some Network Tests
TCP Request – Response
Zero stats
OK done
Request event
Send event
data
Request-Response
time (Histogram)
●●●
●●●
Get remote statistics
Send statistics:
CPU load & no. int
1-way delay
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Lab Test: TCP Request-Response Histograms
PC – router – PC
BE Test flow
Request spacing 0 µs
1000
Request spacing 10 ms
0.5M bytes man02-3_7may04
0.5M bytes w 10ms man02-3_7may04
800
600
600
N(t)
N(t)
800
400
200
0
4280
200
4300
4320
4340
4360
4380
4400
4420
4440
0
4280
4460
Latency us
1000
800
600
600
400
200
0
8580
4300
4320
4340
1000
1.0M bytes man02-3_7may04
800
N(t)
N(t)
400
4400
4420
4440
4460
400
200
8600
8620
8640
8660
8680
8700
8720
8740
0
8580
8760
8600
8620
8640
8660
8680
8700
8720
8740
8760
Latency us
1000
1000
2.0 M bytes man02-3_7may04
800
800
600
600
N(t)
N(t)
4380
Latency us
1.0 M bytes w 10ms man02-3_7may04
Latency us
400
2.0 M bytes w 10ms man02-3_7may04
400
200
200
0
17080
4360
17100
17120
17140
17160
17180
17200
17220
17240
17260
0
17080
17100
17120
17140
Latency
Meeting
onusATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
17160
17180
Latency us
17200
17220
17240
17260
Man-CERN: TCP Request-Response Latency
DataTAG PC – backup link
BE Tests
Request spacing 0 µs
Win size 2.5Mbytes
ave time
w05gva-gnt5_7May04_TCP
min time
600000
max time
500000
Latency us
400000
300000
200000
100000
0
0
20000
40000
60000
80000
100000
120000
140000
160000
Message length bytes
Compare with UDP latency
Large differences
req-resp UDP latency us
ave time
250000
Latency us
Rtt of 20 ms
delay*bw = 2.5 Mbytes
w05gva-gnt5_7May04_TCP
300000
200000
150000
100000
50000
1Mbyte data = 690 pkts
interesting bursts !
0
0
20000
40000
60000
80000
Message length bytes
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
100000
120000
140000
160000
Man-CERN: UDP Throughput & Packet Loss
1000
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
w05gva-gnt5_7May04_UDP
900
Recv Wire rate Mbits/s
DataTAG PC – backup link
BE Tests
Throughput
800
700
600
500
400
300
200
100
0
0
Packet loss
5
10
15
20
Spacing between frames us
25
30
35
40
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
w05gva-gnt5_7May04_UDP
18
16
% Packet loss
14
12
10
8
6
4
2
0
0
5
10
15
20
Spacing between frames us
25
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
30
35
40
Traffic Flows
Manchester – NetNorthWest - SuperJANET Access links
Link to PC in M/c
Access links:
1 GE Man to NNW
Total Man to NNW
NNW to
SuperJANET4
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
VLBI Project: Packet Loss – Is it Poisson?
Divide time series of packets
into 1000 slices of 50 packets
Total lost packets 1410
Average number / slice = 1.4
400
350
300
250
N(n)
Calc Poisson Probability
P(n, µ) = µ n e -µ
n!
run12b
1
1.3
1.4
1.8
200
150
100
50
0
0
5
10
n num lost in sub-sample
Curves close but not exact
Could be more than 1 process
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
15
Traffic QoS Classes on GÉANT Backbone
Normal Traffic
+
Radio Astronomy Data
+
Less Than Best Effort
Max Throughput on 2.5 G PoS
2.0 Gbit/s
Normal Traffic
+
Less Than Best Effort
2.0 Gbit/s
Normal Traffic
+
Radio Astronomy Data
500 Mbit/s
Normal Traffic
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Some Measurements made during ER2002
20
num_badorder
25000
No LBE
num_lost
18
20000
14
12
15000
10
8
10000
No. Lost
No. Out of order
16
6
4
5000
2
0
0
20
40
60
80
45000
100
120
Transfer number
140
160
180
num_badorder
25000
num_lost
With 1.8Gbit LBE
40000
20000
35000
30000
15000
25000
20000
10000
15000
10000
5000
5000
0
0
20
40
60
80
Transfer number
100
120
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
140
0
160
No. Lost
No. Out of order
0
200