Notes
Slide Show
Outline
1
Network Routing using
Q-learning: an overview
  • Filip Aben & Kristof Van Laerhoven
2
Networks & routing
3
Why ?
  • Networks become dynamic:
    • Hardware failures
    • congestion, bottlenecks
  • Changes in network:
    • load levels
    • traffic patterns
    • network topology
  • Current routing protocols lack flexibility !!!
4
Overview
  • Traditional algorithms
    • Bellman Ford routing (‘58: Bellman)
    • (Weighted) Shortest Path (‘59: Dijkstra)
  • Q-routing (‘94: Boyan & Littman)
    • + Predictive Q-routing (‘96: Choi & Yeung)
    • + Dual reinforcement Q-routing (‘97: Kumar & Miikkulainen)
    • + Confidence-based Q-routing (‘98: Kumar)
    • + Probabilistic Q-routing (‘98: Kumar)
  • Agent-driven routing (‘98: Snyers et al.)
5
Traditional algorithms
  • shortest path routing (‘62 Floyd)
    • static routing
    • no flexibility...
  • weighted shortest path (‘59 Dijkstra)
    • Cost Adjacency Matrix Ù state of network
    • global routing
    • usually impossible to implement
6
Traditional algorithms
  • Bellman Ford routing:
    • approach: Distance Vector Routing
    • Cost table, Routing table, update rules
    • local routing: sub-optimal paths
    • periodic updates of tables from neighbours
    • most commonly used
    • reacts prompt to good news, slow to bad news
    • exchange of large tables => overhead
7
Q-routing
  • Q-learning (‘89: Watkins)
    • Reinforcement Learning algorithm
    • states, actions
    • policy for choosing actions
  • Q-routing:
    • for each node x: Q-table Qx of Q-values
    • Q-value: Qx(y,d): estimated time ( x Ù y Ù d )
8
Q-routing
9
"find minimum from table Qx"
  • find minimum from table Qx     ® Qx(y,d)
  • send packet to y
  • y returns qy and minimum from Qy ® Qy(z,d)
  • update Qx(y,d):
    • Qx(y,d) := Qx(y,d) + hf.( Qy(z,d) + qy + d - Qx(y,d) )


    • hf Î[0,1] learning rate
    • qy = time spend in queue of node y
    • d = transmission delay x ® y
10
 
11
Evaluation Q-routing
  • Is unable to detect if initial high Q-value should decrease
  • => no more exploration for that link!
  • Improvements are possible:
    • more exploration
    • better exploration

12
Confidence-based Q-routing
  • Improvement of quality of exploration
  • Confidence-value: Cx(y,d) Î[0,1]
    • (0 = no, 1 = full ) confidence in Qx(y,d)
  • update rule C-values:
  • Cx(y,d) = Cx(y,d) + hf.(Cy(z,d) - Cx(y,d) )
  • update rule Q-values:
  • Qx(y,d) = Qx(y,d) + hf.( Qy(z,d)+qy+d - Qx(y,d) )
  • with hf containing Cx(y,d) and Cy(z,d)
13
Dual Reinforcement Q-routing
  • Improvement of quantity of exploration
  • adds backward exploration
    • sends Qy(x,s) with packet
    • Qy(x,s) = Qy(x,s) + hb.(Qx(z,s) +qx+d - Qy(x,s))
  • also possible with Confidence-based:
  • Þ Confidence-based Dual Reinforcement Q-routing (CDR Q-routing)
14
 
15
CDR Q-routing
  • forward:
  • Qx(y,d) = Qx(y,d) + hf.(Qy(z,d) +qy+d - Qx(y,d))
  •   Cx(y,d) = Cx(y,d) + hf.(Cy(z,d) - Cx(y,d) )
  • backward:
  • Qy(x,s) = Qy(x,s) + hb.(Qx(z,s) +qx+d - Qy(x,s))
  • Cy(x,s) = Cy(x,s) + hb.(Cx(z,s) - Cy(x,s) )


  • hf = hf(Cx(y,d), Cy(z,d) )
  • hb = hb(Cy(x,s), Cx(z,s) )
16
Problems with CDRQ-routing
  • If load decreases, Q-routing still cannot adapt!
  • solutions:
    • keeping track of the rate of change
    • Ô Predictive Q-routing
    • ‘random’ decisions
    • Ô Probabilistic Q-routing:
17
Probabilistic Q-routing
  • = Q-routing with stochastic routingprocess
  • Q-values treated as random values with Gauss distribution
  • Qx(y,d) = mean of Gauss distribution
  • also:
    • Probabilistic Confidence-based Q-routing:
      • Cx(y,d) Õ variance of Gauss distribution
    • Probabilistic Confidence-based Dual Reinforcement Q-routing

18
Predictive Q-routing
  • 4 values:
    • Qx(y,d) = estimated time for  x Ù y Ù d
    • Bx(y,d) = best estimated time for  x Ù y Ù d
    • Rx(y,d) = recovery rate for  x Ù y Ù d
    • Ux(y,d) = last update-time for this path
  • and also...
    • Dual Reinforcement Predictive Q-routing
    • Confidence-based Dual Reinforcement Predictive Q-routing
19
 
20
 
21
Conclusions
  • Q-routing: Probabilistic Q-routing...
  • Flexibility becomes more important
    • complexity and size of current networks Ú
      • different (more) data
      • more use of networks
    • new developments: mobile networks
  • trade-off:
  • less control overhead Ö flexibility