DISQUS

Princeton S* Network Systems: Coordination in Distributed Systems (ZooKeeper)

  • rp · 8 months ago
    Interesting that you mentioned WAN, did you ran into any issues with WAN ?
  • Jeff Terrace · 8 months ago
    Only that when deploying over a WAN, there is a lot of redundant cross-datacenter traffic when there are multiple ZooKeeper nodes in each datacenter. I also didn't evaluate its performance over a WAN, but I'd imagine that latency becomes an issue.
  • benjamin reed · 8 months ago
    hey jeff! i'm glad you like zookeeper. i agree that zookeeper needs more WAN support; it's high on our wish list. zookeeper currently assumes a data center network where all nodes have the same latency and bandwidth between them. (in our large data centers that isn't really true, but it's close enough.) this assumption causes us problems in two dimensions: 1) as you mention, consensus traffic has a star topology with the coordinator at the center. since WAN links have higher latency and lower bandwidth than the data center network, we should really build a tree from the coordinator and minimize the traffic over the WAN links. 2) clients treat all servers equally, so in WAN applications, clients usually are given a list of just the local zookeeper servers to connect to, but if those servers go down, it would be nice to connect to a server in another data center to keep things going.

    we are hoping to fix this weakness of zookeeper. (i would be super happy to commit a patch from you :)