Apache Avro is a service in Hadoop that enables data serialization. The main tasks of Avro are:

  • Provide complex data structures
  • Provide a compact and fast binary data format
  • Provide a container to persist data
  • Provide RPC’s to the data
  • Enable the integration with dynamic languages

Avro is built with a JSON Schema, that allows several different types:

Elementary types

  • Null, Boolean, Int, Long, Float, Double, Byte and String

Complex types

  • Record, Enum, Array, Map, Union and Fixed

The sample below demonstrates an Avro schema

{“namespace”: “person.avro”,

“type”: “record”,

“name”: “Person”,

“fields”: [

{“name”: “name”, “type”: “string”},

{“name”: “age”,  “type”: [“int”, “null”]},

{“name”: “street”, “type”: [“string”, “null”]}

]

}

Table 4: an avro schema

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply