WedTM's Blog

Web developer, father, husband, and awesome dude. All rolled into one.

Twitter's Real-Time URL Fetcher Using Cassandra and Memcached

nosql:

Twitter’s real-time URL fetcher, code named SpiderDuck, is an excellent example of how NoSQL databases fit in the architecture of today’s systems:

Metadata Store: This is a Cassandra-based distributed hash table that stores page metadata and resolution information keyed by URL, as well as fetch status for every URL recently encountered by the system. This store serves clients across Twitter that need real-time access to URL metadata.

SpiderDuck is also using memcached:

Memcached: This is a distributed cache used by the fetchers to temporarily store robots.txt files.

SpiderDuck Architecture Cassandra Memcached

Original title and link: Twitter’s Real-Time URL Fetcher Using Cassandra and Memcached (NoSQL database©myNoSQL)

  1. wedtm reblogged this from nosql
  2. vgarrido reblogged this from nosql
  3. seapomeranian reblogged this from nosql
  4. ricoconess reblogged this from nosql
  5. nosql posted this

Ultralite Powered by Tumblr | Designed by:Doinwork