Vishal Shah - #V's Blog

  • Archive
  • RSS
  • Ask me anything

Redis + Lua for processing JSON values

Redis + Lua for processing JSON values

I love Redis. Its simplicity & attention to minimalism is striking and I find myself right at home when working with Redis. Its no surprise, Redis is one of my favorite open-source project.

Well, you might have heard that Redis 2.6 RC was just released and it has native lua support! Now you might think, what the hell? But trust me, this is awesome. Think of it like PLSQL (well, kinda…) of Redis.

For things that you traditionally had to pull objects back from to redis to do, for example processing Redis objects and summarizing the results or generating stats or calculating mean/averages, you know can do it all right in Redis. That’s right, in Lua. Lua scripting is powerful. And thankfully Lua scripts are run in an atomic fashion - no other scripts are commands is executed while the script is executing.

Plus, Redis’ Lua support includes some of the most popular Lua libraries - base lib, table lib, string lib, math lib, debug lib, cjson lib & cmsgpack lib.

The one I am excited about is cjson! Yes, now that means you can “near-natively” process JSON in Redis. I say near-natively, since it still has to go through the Lua runtime. But that’s OK. Its an extension of Redis, the way I look at it. And this is better than native Lua support, because now sky is the limit. You want native YAML support, well, if yaml lib is included (can’t right now, but fingers crossed), you can get that via the same Lua based scripting system. Its quite ingenious way to cheat, to build more, by less.

A use case where we have a need for such processing is aggreagte value calculation. We store stringified JSON strings as values in Redis. An example is poll stats. If I were to calculate demographic level stats or just a total count of values, I would have to batch fetch the entire JSON object values and calculate the stats in my app layer (Node.js for example). With Lua, I can parse JSON (decode), loop through the matching keys/objects and calculate the stats and return them all from Redis, without ever getting back to the app layer. This is fantastic. (If this interests you, be sure to see some the PS’s below).

Here are some code examples - (Note: You use the eval command for executing Lua scripts. The Lua runtime is also sandboxed and not all Lua objects are available, for example io is not available, as it simply does not make sense.)

eval "return cjson.decode(cjson.encode('{v_rocks:true,whos_v:\"http://www.vishalshah.org\"}'))" 0

"{v_rocks:true,whos_v:\"http://www.vishalshah.org\"}"

Here’s another example where we set the following JSON string {v_rocks:true,whos_v:"http://www.vishalshah.org"}, which is encoded via cjson and retrieve it after decoding via the same.

redis 127.0.0.1:6379> eval "redis.call('set', 'vmeta', cjson.encode('{v_rocks:true,whos_v:\"http://www.vishalshah.org\"}'))" 0

(nil)

redis 127.0.0.1:6379> get vmeta

"\"{v_rocks:true,whos_v:\\\"http:\\/\\/www.vishalshah.org\\\"}\""

redis 127.0.0.1:6379> eval "return cjson.decode(redis.call('get', 'vmeta'))" 0

"{v_rocks:true,whos_v:\"http://www.vishalshah.org\"}"

Voila!

PS0: You still have have to JSON stringify/parse from the app layer when storing JSON values in Redis. However this is best done at the redis driver level. If you know of a Redis driver that does not have json stringify/parse wrappers, make sure you add them by extending/forking the driver yourself.

PS1. More than JSON support specifically, I am more excited about Lua in general. Stats calculations, log like processing, filters, and other checks can now happen in Redis.

PS2. Be careful - Lua scripts are atomically run. If you have a long running Lua script, other Redis commands will wait and hence you backend’s processing will slow down. With a lot of power, comes great responsibility. One way to cheat is to run Lua scripts on slaves that have less load or on staging servers running from most recent Redis dumps, so you can safely play around!

Vishal

    • #redis
    • #nosql
    • #architecture
  • 3 weeks ago
  • Permalink
  • Share
    Tweet

Web Scale, Online-Offline Architecture Pattern/Template for Tiny to Large Scale Products

Reading about and practicing myself many many architectural styles & pattern, I think I have something rather interesting to share.

I have identified a very standard architecture pattern, most online-offline systems can use (an online-offline system is a word I made up :) that supports both online/realtime & offline processing).

Its not new or anything revolutionary. It just works. digg follows most of that standard/canned architecture - app server - caching layers - storage - messaging/queues - async/offline hadoop processing.

Such architecture supports everything from small to large use cases and products. It supports processing large amount of data via hadoop while quick data queries uses the cache and datastore. App server is where the “online” business logic is. Offline business logic is in hadoop.

Its so generic & balanced that you can scale individual layers/components without affecting rest of the system. Also it fits in our SOA, where apps on the top are nothing but service end-points with no UI, or they can be java/node/php… apps with a UI.

I am not proposing, this is one stack that everybody should use, but its sort of an architecture pattern that works for many problems. You can skip components as you wish and add them later. For ex, for a mobile app/service we are currently building, we have no need for offline processing & a dedicated messaging layer - hence the hadoop & messaging layer goes away! We can add in the future if we want to - to add lots of stats, processing, etc..

So, you can scale it back to just the app server, caching and data store if you need, which is not that exciting or revealing, but you know you can scale it to ridiculous extent, by adding the async messaging & offline MapReduce/Hadoop processing layer. And bingo!!

Here’s the digg diagram. I chose not to build my own diagram because I am lazy. Digg’s diagram has most of it. Don’t get too carried away with their arrows. The important thing to note are the components and layers and the purpose they serve.

— Vishal

    • #architecture
  • 3 months ago
  • 1
  • Permalink
  • Share
    Tweet

Managing User Presence, Software Caches, Counters, Sessions among other things using Redis

I have so many developers & architect choose the wrong strategy for managing user presence, caches, stats and session information.

My simple advice that works - use redis + expire. Its blazing fast, stupidly simple and very scalable using file syncing and replication and/or sharing.

Don’t try to kludge in using the old school way using sql or custom in-memory hash tables or memcached used inappropriately.

As a software architect, the hardest thing to do is pick the right tool for the job while balancing complexity, cost, performance and learning. And if there is one tool I never forget and keep on getting back to is redis which is an intentionally kept simple but superb artifact of the KISS principle. On top of it, its genius is behind its beautiful command language that is easy to operate and learn and more importantly build upon for very-simple-to-very-complex systems based on data models that uses KV alone. It takes a while to get used to but KV > Complex data systems for most situations, especially for session management.

You will find yourself using Redis for counters, stats, caches all while not having the cold start problem since Redis fsync’s (file sync) the memory representation so you can always lose a Redis instance and bring another back up without a sweat.

And the icing on the cake? It complements and not competes with other NOSQL, especially persistent, systems. I get amused when people talk or search for redis vs cassandra vs riak vs …

The closest you probably know related to redis is memcached.

Vishal

    • #architecture
  • 4 months ago
  • 2
  • Permalink
  • Share
    Tweet

API Design Best Practices. How to attain API Awesomeness.

API Design Best Practices

API’s can be fundamentally important for an organization. However, its use, contrary to popular belief, is not just for external clients & developers, but they can play an important role in building system wide applications & towards an API driven architectural style(more than in a separate blog).

So what are the good qualities to think about while designing APIs. In other words, I want to try to discuss, how to design APIs that are well designed, scalable (in its use) and some thoughts around related matters.

Qualities of good APIs

  • Less is more
    • The less the API set, the better.
      • There are some obvious benefits here. Less means less to build, support, maintain. Its also supposedly easier to understand by API consumers, developers. Its easy to build off them as well.
      • Sometimes there is a need for more powerful, custom APIs. In that case, my genius idea is to build another “set” of custom APIs on “top”.
        • Ok, so what does that mean? Well, lets take an example. I have 3 simple APIs to offer to the rest of the world and to developers within my company. Now, I will offer another set of APIs that are built on “top” off these APIs with additional hooks and customizations. These APIs are designed for the power users, if you will. They are documented as such in a separate section. Now, what are the benefits of such an approach?
          • Simple - isolation. Which allows us to “evolve” one set of APIs separately than others. This might not seem obvious, but believe me when I say, its tremendously powerful. I can add, deprecate a set of APIs, add/remove params without impacting my entire user base. This is awesome.
            • I don’t know of any organization, large or small, formally following this. They might have such a pattern, but it often is an artifact of iterations, vs planned. Simply following this rule, will make you a better API designer, architect and developer.
  • Loosely coupled to clients, possibly RESTful, platform agnostic
    • Architects have learned the hard way, the cost of building tightly coupled APIs to clients. And I understand why they did that. APIs initially were needed to solve a problem, offer services to a set of clients and hence its natural for API designers to follow that client needs. Wrong 8/10 times. Clients come & go. That’s a fact of life. Both internal & external clients change, because they are often product focused and no product is constant, at least the successful ones. DON’T design your APIs for a client or a smaller set of clients. You will be amazed, at the possibilities of beautifully designed APIs. Systems are better built with loose dependencies on APIs as opposed to tight binary dependencies.
    • REST is a very powerful pattern, that the web/http builds upon and you can not go wrong building RESTful APIs, but there are some gotchas. Don’t follow 100% REST terminology as chances are you will not have web scale routers and caching infrastructure. There is often something more needed. Also some of REST philosophy is hard to absorb. It’s OK. Start small, simple. Iterate from that point on. Learn from giants - Twitter, Facebook, LinkedIn, Google.
  • Performant
    • APIs are no good if they are not fast, simply put. Clients, often external, are dependent on them. Their performance, their user experience is dependent on your shoulders. And if there are 1000’s of clients, that’s a lot of responsibility. Don’t sweat it. Only expose APIs in the beginning that you know follow good algorithms, example does not involve an entire sweep of the database. APIs should mostly use keys to look up data, cache hard to calculate data and avoid user specific complex data to be served without designing & planning for it. Caches are API’s best friends. Take advantage of them. Assume clients will try to abuse them. You have to be smart, accountable & resilient.
    • Keep responses small. Large responses are one of the biggest reasons some APIs are slower. Support pagination if there is more data. Simple. (See more tips in the response format tips below!)
    • Consider supporting binary response formats for extra perfomance. msgpack, protocol buffers are excellent for compacting your API response and at the same time, supporting fast data parsing and loading, a great win!
  • Don’t trust your clients
    • This is something novice designers & architects always make. Trusting the clients to do the right thing. Wrong again. Never, trust your clients enough. Even if they are internal clients. Because even if the intentions are good, a small client application bug can suffocate API’s which in turn has down stream impacts.
  • API servers are API servers
    • API servers are often aggregators. They don’t have a lot of business logic, but instead depend on other systems to pull and present the data to the clients. Often from a variety of back-ends. If you are not doing this, in other words, designing or building APIs on your app server, well good luck. I will say, its a terrible idea. Even if you entire stack is hosted off one-machine, try to isolate the api server as a separate module, with its own dependencies and code base. There is only binary/library/API dependency on other modules/system components. You will thank me for this.
  • Param Design
    • Keep your param list small, very small for that matter. Give good names, so that 9/10 times I get it without reading a lot of documentation. These are for obvious reasons. As a rule of thumb, allways choose simplicity when possible. As API designers, this is the hardest aspect. As computer scientists, we are trained to solve complicated problems, but we are not trained enough on simplicity, the power of less, and all that. Just look at Apple products & user experiences.
    • Clearly mark off optional parameters and document them separately, because many developers might not even be interested in that much power. Also, on the flip side, clearly note your default values for optional params.
    • If you are designing HTTP APIs, follow the proper verbs. GET for data out. PUT/POST for data in or updates. DELETE for deleting data (use caution here, especially around authentication).
  • Response design & formats
    • Support only response format to the extent possible. Don’t try to be cool and support multiple formats unless absolutely required. JSON, these days offers a good balance of simplicity and client compatibility across the gamut of clients out there. I don’t prefer XML, but it works.
    • If you need to support, more than one response format, try to isolate the view (the response template) from the data. This is inline with MVC design pattern. This will make it very easy to support & maintain multiple formats. And its easy to debug issues as well. Please note this!
    • Keep responses small. Don’t include everything in the response.
    • I have a great tip, that many don’t know. Use params to drive what to include in the response. This is a great way to give the control to the client and let them decide the tradeoff between performance and quantity. You should make this clear in the API documentation. Also, this can be designed as a priviliged service. Meaning, only priviliged (trusted, etc) can choose to include certain items in the response as they add to the response and/or cost more to calculate/generate.
  • Real-Time Processing for Statistics & Monitoring
    • Avoid doing real-time processing to the extent possible that is in regards to monitoring or calculating stats or for monitoring purposes. That’s because often times its not very simple to that in a fashion that does not compromise the API performance, and the integrity of the statistical data and taking other necessary actions.
    • I have a much better idea to share for that problem. Do processing offline! What I mean by that is that you should log all the requests you get for the systems. Periodically process that data and calculate aggregate stats that the APIs can use to monitor and accept/deny/etc activities from clients. This is amazing powerful because it is very scalable. Offline systems can use hadoop/map-reduce on streamed scribe data and there you go, you can calculate the most sophisticated or the simplest of statistic, like API counts from a client or partner. Imagine, doing that in real-time, especially when you have 20 API servers and any one of them can serve API request. You have no choice but to use a distributed storage system and/or cache locally and periodically sync with other peers.
    • Instead, doing it offline and updating the live API servers with that stat or making it available in constant time lookup (key lookups from caching systems or KV data stores for ex Redis) does wonders. Sure, there is a downside. You will loose a window of opportunity when the stat is not updated and you would still be serving clients. But you can have the offline batch processing as often as you like. For ex. every 15 minutes. That way, the worse you loose is a 30 minute window. And besides, you should be investigating DoS attacks among other things at the site level, not just APIs, and those systems can help/aid under critical attack or abuse situations.
  • Authentication
    • Coming soon!

Vishal

    • #api
    • #architecture
    • #design
  • 5 months ago
  • 3
  • Permalink
  • Share
    Tweet

About

  • Blog Archive
  • Vishal's Home Page

I like designing & architecting things that help better and simplify life in some way or other.

I have degrees in Computer Science & Mechanical Engineering and have studied Industrial Design in San Francisco, CA.

Pages

  • Reading
  • Playing
  • Work
  • Contact
  • Following

Me, Elsewhere

  • @whos_v on Twitter
  • goldenv on Flickr
  • Linkedin Profile
  • Xbox Live Profile

Twitter

loading tweets…

  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr