Review of 3 Recent Internet of Thing (IoT) Announcements

Working in the big data and analytics space, I’m always interested in parts of the Internet of Things (IoT) that will produce more data, require more backend systems, and help users/customers get on with their day better.

The past week has shown a few interesting announcements relating to Internet of Things topics.  Here are just a few that jumped out to me, either because they inspired me or because I was left wondering what it would really mean.

TL;DR?  Summary: While IBM is “getting started” (oops, I meant “getting serious”) and Facebook has big plans to “take over”, Amazon comes out with a consumer focused solution.

Continue reading Review of 3 Recent Internet of Thing (IoT) Announcements

Amazon Dash Button – Concept and Article re: IoT use

I like this concept: “Just press and never run out”.   It’s the Amazon Dash Button: http://amazon.com/… – intended to be stuck onto appliances, basically retrofitting ones that don’t have them built in (in the future).  Pressing a button orders refills of products, just like Amazon one click ordering online.

Also read this Fortune.com article about how this fits into their overview vision for Internet of Things (IoT) at Amazon: http://fortune.com/…

What would you put a button on?  Scary or useful?

Amazon announces Dash Buttons
Amazon announces Dash Buttons

 

Kafka Consumer – Simple Python Script and Tips

[UPDATE: Check out the Kafka Web Console that allows you to manage topics and see traffic going through your topics – all in a browser!]


 

When you’re pushing data into a Kafka topic, it’s always helpful to monitor the traffic using a simple Kafka consumer script.  Here’s a simple script I’ve been using that subscribes to a given topic and outputs the results.  It depends on the kafka-python module and takes a single argument for the topic name.  Modify the script to point to the right server IP.

from kafka import KafkaClient, SimpleConsumer
from sys import argv
kafka = KafkaClient("10.0.1.100:6667")
consumer = SimpleConsumer(kafka, "my-group", argv[1])
consumer.max_buffer_size=0
consumer.seek(0,2)
for message in consumer:
 print("OFFSET: "+str(message[0])+"\t MSG: "+str(message[1][3]))

Max Buffer Size

There are two lines I wanted to focus on in particular.  The first is the “max_buffer_size” setting:

consumer.max_buffer_size=0

When subscribing to a topic with a high level of messages that have not been received before, the consumer/client can max out and fail.  Setting an infinite buffer size (zero) allows it to take everything that is available.

If you kill and restart the script it will continue where it last left off, at the last offset that was received.  This is pretty cool but in some environments it has some trouble, so I changed the default by adding another line.

Offset Out of Range Error

As I regularly kill the servers running Kafka and the producers feeding it (yes, just for fun), things sometimes go a bit crazy, not entirely sure why but I got the error:

kafka.common.OffsetOutOfRangeError: FetchResponse(topic='my_messages', partition=0, error=1, highwaterMark=-1, messages=)

To fix it I added the “seek” setting:

consumer.seek(0,2)

If you set it to (0,0) it will restart scanning from the first message.  Setting it to (0,2) allows it to start from the most recent offset – so letting you tap back into the stream at the latest moment.

Removing this line forces it back to the context mentioned earlier, where it will pick up from the last message it previously received.  But if/when that gets broke, then you’ll want to have a line like this to save the day.


For more about Kafka on Hadoop – see Hortonworks excellent overview page from which the screenshot above is taken.

%d bloggers like this: