|
1
|
- Filip Aben & Kristof Van Laerhoven
|
|
2
|
|
|
3
|
- Networks become dynamic:
- Hardware failures
- congestion, bottlenecks
- Changes in network:
- load levels
- traffic patterns
- network topology
- Current routing protocols lack flexibility !!!
|
|
4
|
- Traditional algorithms
- Bellman Ford routing (‘58: Bellman)
- (Weighted) Shortest Path (‘59: Dijkstra)
- Q-routing (‘94: Boyan & Littman)
- + Predictive Q-routing (‘96: Choi & Yeung)
- + Dual reinforcement Q-routing (‘97: Kumar & Miikkulainen)
- + Confidence-based Q-routing (‘98: Kumar)
- + Probabilistic Q-routing (‘98: Kumar)
- Agent-driven routing (‘98: Snyers et al.)
|
|
5
|
- shortest path routing (‘62 Floyd)
- static routing
- no flexibility...
- weighted shortest path (‘59 Dijkstra)
- Cost Adjacency Matrix Ù state of network
- global routing
- usually impossible to implement
|
|
6
|
- Bellman Ford routing:
- approach: Distance Vector Routing
- Cost table, Routing table, update rules
- local routing: sub-optimal paths
- periodic updates of tables from neighbours
- most commonly used
- reacts prompt to good news, slow to bad news
- exchange of large tables => overhead
|
|
7
|
- Q-learning (‘89: Watkins)
- Reinforcement Learning algorithm
- states, actions
- policy for choosing actions
- Q-routing:
- for each node x: Q-table Qx of Q-values
- Q-value: Qx(y,d): estimated time ( x Ù y Ù d )
|
|
8
|
|
|
9
|
- find minimum from table Qx
®
Qx(y,d)
- send packet to y
- y returns qy and minimum from Qy ® Qy(z,d)
- update Qx(y,d):
- Qx(y,d) := Qx(y,d) + hf.( Qy(z,d) + qy
+ d - Qx(y,d)
)
- hf Î[0,1] learning rate
- qy = time spend in queue of node y
- d = transmission
delay x ®
y
|
|
10
|
|
|
11
|
- Is unable to detect if initial high Q-value should decrease
- => no more exploration for that link!
- Improvements are possible:
- more exploration
- better exploration
|
|
12
|
- Improvement of quality of exploration
- Confidence-value: Cx(y,d) Î[0,1]
- (0 = no, 1 = full ) confidence in Qx(y,d)
- update rule C-values:
- Cx(y,d) = Cx(y,d) + hf.(Cy(z,d) - Cx(y,d)
)
- update rule Q-values:
- Qx(y,d) = Qx(y,d) + hf.( Qy(z,d)+qy+d - Qx(y,d) )
- with hf
containing Cx(y,d) and Cy(z,d)
|
|
13
|
- Improvement of quantity of exploration
- adds backward exploration
- sends Qy(x,s) with packet
- Qy(x,s) = Qy(x,s) + hb.(Qx(z,s) +qx+d - Qy(x,s))
- also possible with Confidence-based:
- Þ Confidence-based
Dual Reinforcement Q-routing (CDR Q-routing)
|
|
14
|
|
|
15
|
- forward:
- Qx(y,d) = Qx(y,d) + hf.(Qy(z,d) +qy+d - Qx(y,d))
- Cx(y,d) = Cx(y,d)
+ hf.(Cy(z,d)
- Cx(y,d) )
- backward:
- Qy(x,s) = Qy(x,s) + hb.(Qx(z,s) +qx+d - Qy(x,s))
- Cy(x,s) = Cy(x,s) + hb.(Cx(z,s) - Cy(x,s)
)
- hf = hf(Cx(y,d),
Cy(z,d) )
- hb = hb(Cy(x,s),
Cx(z,s) )
|
|
16
|
- If load decreases, Q-routing still cannot adapt!
- solutions:
- keeping track of the rate of change
- Ô Predictive
Q-routing
- ‘random’ decisions
- Ô
Probabilistic Q-routing:
|
|
17
|
- = Q-routing with stochastic routingprocess
- Q-values treated as random values with Gauss distribution
- Qx(y,d) = mean of Gauss distribution
- also:
- Probabilistic Confidence-based Q-routing:
- Cx(y,d) Õ variance of Gauss distribution
- Probabilistic Confidence-based Dual Reinforcement Q-routing
|
|
18
|
- 4 values:
- Qx(y,d) = estimated time for
x Ù y Ù d
- Bx(y,d) = best estimated time for x Ù y Ù d
- Rx(y,d) = recovery rate for
x Ù y Ù d
- Ux(y,d) = last update-time for this path
- and also...
- Dual Reinforcement Predictive Q-routing
- Confidence-based Dual Reinforcement Predictive Q-routing
|
|
19
|
|
|
20
|
|
|
21
|
- Q-routing: Probabilistic Q-routing...
- Flexibility becomes more important
- complexity and size of current networks Ú
- different (more) data
- more use of networks
- new developments: mobile networks
- trade-off:
- less control overhead Ö flexibility
|