Transcript - Way2MCA

Message Passing

Inter Process Communication

• Original sharing ( shared-data approach ) P1 Shared memory P2

• Copy sharing ( message passing approach ) Basic IPC mechanism in distributed systems P1 P2

Desirable Features of a Good MPS

• • Simple Clean & simple semantics to avoid worry about system or network aspects • • • Uniform Semantics Local communication Remote communication • • Efficiency Aim to reduce no. of messages exchanged • • Reliability Cope with failure problems & guaranteed delivery of messages. Also handle duplicate messages.

• Correctness • Handle group communication • • • Atomicity Ordered Delivery Survivability • Flexibility • Users have flexibility to choose & specify type & level of reliability & correctness requirement • Security • Secure end to end communication • Portability • Message passing system & applications using it should be portable.

Message Structure

 A block of information formatted by a sending process such that it is meaningful to receiving process.

 Various issues like who is sender/receiver, what if node crashes, receiver not ready etc have to be dealt with.

Actual data or pointer to the data Structural information Number of bytes /elements Type( Actual data or pointer data) to Sequence number message or ID Addresses Receiving process address Sending process address Variable size data Fixed length header

Synchronization

 Synchronization is achieved by communication primitives – – Blocking Nonblocking  The two types of semantics are used on both send & receive primitives.

 Complexities in synchronization – How receiver knows when message is received in message buffer in non blocking receive?

• • Polling Interrupt – Blocking receiver/sender crashes or message is lost.

• send/receive could get blocked Timeout forever if

Synchronous Communication

 When both send and receive primitives use blocking semantics.

Sender’s execution Receiver’s execution

Send(msg); Execution suspended Execution resumed Msg Ack Receive(msg); Execution supended Execution resumed Send(ack) Blocked state Execution state

Synchronous vs. Asynchronous Communication

Synchronous Communication

– Advantages • • Simple & easy to implement Reliable – Disadvantages • • • • Limits concurrency Can lead to communication deadlock Less flexible as compared to asynchronous Hardware is more expensive

Asynchronous Communication

– Advantages • Doesn't require synchronization communication sides • of both Cheap, timing is not as critical as for synchronous transmission, therefore hardware can be made cheaper • Set-up is very fast, well suited for applications where messages are generated at irregular intervals • Allows more parallelism – Disadvantages • Large relative overhead, a high proportion of the transmitted bits are uniquely for control purposes and thus carry no useful information • Not very reliable

Buffering

   

Null Buffer (No Buffering) Single Message Buffer Unbounded Capacity Buffer Finite Bound ( Multiple Message) Buffer

 Involves single copy .

Null Buffer

 Can be implemented in following ways: – – Sender sends only when receives acknowledgement receiver i.e. receiver executes otherwise.

from ‘receive’. It remains blocked After executing ‘send’, sender waits for acknowledgement.

If not received within timeout period, it assumes message was discarded & resends.

 Not suitable for asynchronous transmission. Receiver blocked till entire message transferred over network.

Sending process Message Message Receiving process

Single Message Buffer

 Used in Synchronous Communication  Single message buffer on receiver’s side.

 Message buffer may be in space kernel’s or receiver’s address  Transfer involves two copy operations Sending process Message Node boundary Single msg buffer Receiving process

Unbounded Capacity Buffer

 Used in asynchronous communication.

 As sender does not wait for receiver to be ready, all unreceived messages can be stored for later delivery.

 Practically impossible

Finite Bound Buffer

 Used in asynchronous communication .

Receiving process Sending process Message Msg 1 Msg 2 Msg 3 Msg n Multiple-message Buffer/ mailbox / port

 Buffer overflow is possible. Can be dealt in two ways: – Unsuccessful communication • Message transfer fails when there is no more buffer space. Less reliable.

– Flow-controlled communication • Sender is blocked until the receiver accepts some messages, creating space in buffer. This requires some synchronization, thus not truly asynchronous.

 Message buffer may be in space kernel’s or receiver’s address  Extra overhead for buffer management.

Multidatagram Messages

 Maximum transfer unit (MTU) a time.

- data that can be transmitted at  Packet (datagram) – Message data + control information.

 Single datagram message - Messages smaller than MTU of the network can be sent in a single packet (datagram).

 Multidatagram messages - Messages larger than MTU have to be fragmented and sent in multiple packets.

 Disassembling and reassembling in sequence, of packets of multidatagram messages, on the receiver side is responsibility of the message passing system.

Encoding and Decoding of message data

 • • Structure of the program objects should be preserved when they are transmitted from sender’s address space to receiver’s address space. Difficult as: An absolute pointer value looses its meaning transferred from one address space to another. Ex. Tree.

Necessary to send object-type information also.

when There must be some way for receiver to program object is stored where in message buffer much space each program object occupies.

identify which & how • • Encoding – program objects converted into stream by sender Decoding data – reconstruction of program objects from message

  Representations used for encoding & decoding: Tagged representation – Type of each program object along with its value is encoded in the message – – Quantity of data transferred more Time taken to encode/ decode data is more  Untagged representation – Message data contains only program objects. Receiving process should have prior knowledge on how to decode data as it is not self-describing.

Process Addressing

 Explicit addressing • • Send (process_id , msg) Receive (process_id , msg)  Implicit addressing • • Send_any (service_id , msg) //functional addressing Receive_any (process_id , msg)

Methods for Process Addressing

 machine_id@local_id – – – machine address @ receiving process identifier Local ids need to be unique for only one machine Does not support process migration

 machine_id@local_id@machine_id – machine on which process is created @ its local process identifier @ last known location of process – Link based addressing node – link information left on previous – A mapping table maintained by kernel for all processes created on another node but running on this node.

– Current location of receiving process is sent to sender, which it caches.

– Drawbacks • Overload of locating process large if process migrated many times.

• Not possible to locate process if intermediate node is down.

 Both methods location non-transparent

Location Transparent Process Addressing

 Centralized process identifier allocator – counter – Not reliable & scalable  Two-level naming scheme – High level machine independent name, low level machine dependent name – – Name server maintains mapping table Kernel of sending machine obtains low level name of receiving process from name server and also caches it – – – When process migrates only low level name changes Used in functional addressing Not scalable & reliable.

Failure Handling

Loss of request msg

Sender Receiver

Send request Lost

Loss of response msg

Sender

Send request Request message

Receiver

Response message Successful request execution Send response Lost

Unsuccessful execution of the request

Sender Receiver

Send request Request message Unsuccessful request execution

crash

Restarted

Four message reliable IPC protocol

client server

Request Acknowledgment Reply Acknowledgment

Blocked state Execution state

Three message reliable IPC protocol

client server

Request Reply Acknowledgment

Blocked state Execution state

Two message reliable IPC protocol

client server

Request Reply

Blocked state Execution state

Fault Tolerant Communication

Client Server

Timeout Send request Request message Lost Retransmit Request Msg Send request Unsuccessful request execution Timeout

Crash

Timeout Send request Retransmit Request Msg Response msg Lost Successful request execution Send request Retransmit Request Msg Successful request execution Response Msg

At – least once semantics

Idempotency

 Repeatability  An idempotent operation produces the same result without any side effect no matter how many times it is performed with the same arguments.

.

debit(amount) if (balance ≥ amount) { balance = balance-amount; return (“Success”, balance);} else return (“Failure, balance); end;

Client

Send request request Debit(100) Time out

response

(success , 900) Send request lost Retransmit request

Server (balance = 1000)

Process debit routine balance =1000-100=900 Return (success , 900) Process debit routine balance=900-100=800 Response (success , 800)

Handling Duplicate Request

 Using the timeout-based retransmission of request , the server may execute the same request message more than once.

 If the execution is non-idempotent, its repeated execution will destroy the consistency of information.

 Exactly –once semantics execution of is used, which ensures that only one server’s operation is performed.

 Use a unique identifier and to set up a for every request that the client makes reply cache in the kernel’s address space on the server machine to cache replies.

Req-id Req -1 Reply (success,900)

Client

Send request-1

Server (balance=1000)

Reply cache

Time out Request-1 Debit (100) Check reply cache for request - 1 No Match found , so process request-1 Save reply Return (success,900) Lost Send request-1 Retransmit request -1 Debit (100) Receive balance =900 response (Success,900) Check reply cache for request - 1 Match found Extract reply Return ( success , 900)

 Ques. Which of the following operations are idempotent?

i.

ii.

iii.

iv.

v.

vi.

vii.

Read_next_record(filename) Read_record(filename, record_no) Append_record(filename, record) Write_record(filename, after_record_n,record) Seek(filename, position) Add(integer1,integer2) Increment(variable_name)

Handling lost and out-of-sequence packets in multidatagram messages

 Stop-and-wait protocol – – Acknowledge each packet separately Communication Overhead  Blast protocol – Single acknowledgement for all packets . What if ?

• • Packets are lost in communication Packets are received out of sequence – – Use bitmap to identify the packets of message.

Header has two extra fields- total no. of packets, position of this packet in complete message.

– Selective repeat packets.

send is implemented for unreceived – If receiver sends (5,01001), sender sends back the 1 st & 4 th packet again.

Group Communication • One to many • Many to one • Many to many

One to Many

 

Multicast Communication Broadcast Communication

 

Open Group

– Any process can send message to group as a whole. Group of replicated servers.

Closed Group

– Only members of a group can send message to the group.

Collection parallel processing.

of processors doing

Group Management

Centralized group server

– Create & delete groups dynamically & allow processes to join or leave group – Poor reliability & scalability 

Distributed Approach

– Open group – outsider can send a message to all group members announcing its presence – Closed group also have to be open with respect to joining

Group Addressing

 Two-level naming scheme – High level group name • ASCII name independent of location of processes in group • Used by user applications – Low level group name • Multicast address / Broadcast address • One to one communication (Unicast) to implement group communication – Low level name :- List of machine identifiers of all machines belonging to a group – Packets sent = no. of machines in group  Centralized group server

Multicast

 Multicast is asynchronous communication – – Sending process can’t wait for response of all receivers Sending process not aware of all receivers  Unbuffered Multicast/ Buffered Multicast   Send to all semantics – Message sent to each process of multicast group Bulletin Board semantics – Message addressed to board – – – channel that acts like bulletin Receiving process copies message from channel Relevance of message to receiver depends on its state Messages not accepted within a certain time after transmission may no longer be useful

Flexible Reliability in Multicast

    0-reliable 1-reliable m out of n reliable All reliable  Atomic Multicast – – – – – All - or - nothing property Required for Involves repeated retransmissions by sender What if sender/ receiver crashes or goes down?

Include multicast all - reliable semantics message identifier & field to indicate atomic – Receiver also performs atomic multicast of message

Group Communication Primitives

send

send_group

– Simplifies design & implementation of group communication – Indicates whether to use

server name server or group

– Can include extra parameter to specify degree of reliability or atomicity

Many to one Communication

   Multiple senders – one receiver.

Selective receiver

– Accepts from unique sender

Non selective receiver

– Accepts from any sender from a specified group

Many-to-many Communication

Ordered message delivery

– All messages are delivered to all receivers in an order acceptable to the application – Requires

message sequencing

S1 R1 R2 m1 m2 S2 Time m2 m1 No ordering constraint for message delivery

Absolute Ordering

 Messages delivered to all receivers in the which they were sent exact order in  Use global timestamps as message identifiers window protocol with it & sliding S1 t1 R1 R2 m2 m1 m2 m1 S2 t2 Time t1 < t2

Consistent Ordering

  All messages are delivered to all receiver process in the same order.

This order may be different from the order in which messages were sent.

S1 t1 R1 R2 m2 m1 m2 m1 S2 t2 Time t1 < t2

 Centralized Algorithm – Kernels of sending machines send messages to a single receiver (sequencer) that assigns a sequence no. to each message then multicasts it.

 Distributed algorithm – Sender assigns temporary sequence no. larger than previous sequence nos., & sends to group.

– Each member returns a proposed sequence no. Member (i) calculates it as max(F max ,P max ) + 1 +i/N F max : Largest seq. no. of any message this member received till yet P max : Largest proposed seq. no. by this member – – Sender selects largest sequence no. & sends to all members in a commit message Committed messages are delivered to programs in order of their final sequence nos.

application

Causal Ordering

 Two message sending events delivered in order causally related (any possibility of second message influenced by first one) then messages to all receivers.

 Two message sending events are said to be causally related if they are correlated by the happened-before relation.

S1 R1 m1 R2 m1 m3 m2 R3 m2 m1 m3 S2 Time

 Happened conditions: – before b.

before relation satisfies following If a & b are events in same process & a occurs – – If a is event of sending a message by one process & b is event of receipt of same message by another process.

If a →b & b →c then a →c

CBCAST Protocol

Vector of Process A

3 2 5 1

Vector of Process B

3 2 5 1

Vector of Process C

2 2 5 1

Vector of Process D

3 2 4 1 Process A sends new msg 4 2 5 1 Msg Deliver Delay A[1]=C[1]+1 not satisfied Delay A[3]<=D[3] not satisfied

S[i]=R[i]+1 and S[j]<=R[j] for all j<>i

4.3BSD Unix IPC Mechanism

 Network independent  Uses sockets for end point communication.

 Two level naming scheme for naming communication end points. Socket has high level string name, low level communication domain dependent name.

 Flexible. Provides sockets with different communication semantics.

 Supports broadcast facility if underlying network supports it.

IPC Primitives

 socket() creates a new socket of a certain socket type, identified by an integer number, and allocates system resources to it.

 bind() is typically used on the server side , and associates a socket with a socket address structure, i.e. a specified local port number and IP address.

 connect() is used in connection based communication by a client process to request a connection establishment between its socket & socket of server process.

 listen() is used on the server side in connection based communication to listen to its socket for client requests.

 accept() is used on the server side. It accepts a received incoming attempt to create a new TCP connection from the remote client.

Read/ Write Primitives

 

Read / write

– connection based communication

Recvfrom/ sendto

- connectionless communication

TCP/IP Socket Calls for Connection

socket()

create socket

bind()

bind local IP address of socket to port

listen() accept() Blocks until connection from client recv()

Process request

send()

place socket in passive mode ready to accept requests

take next request from queue (or wait) then forks and create new socket for client connection socket() connect() send() recv()

Issue connection request to server Transfer message strings with send/recv or read/write

close()

Server

close()

Client Close socket

UDP/IP Socket Calls for Connection

socket()

create socket

bind()

bind local IP address of socket to port

recvfrom() blocks until datagram received from a client request

Process request

sendto()

Receive senders address and senders datagram

reply socket() sendto() recvfrom()

specify senders address and send datagram Server

close()

Client Close socket