The last years have been exciting for Telco’s: 5G is the “next big thing” in communications. It promises us ultra-high speed with low latency. Our internet speed will never be the same again. I’ve been working in the telco business until recently but I would say that the good times of telco’s will soon be gone. Elon Musk will destroy this industry and will entirely shake it up.

Why will Elon Musk disrupt the Telco industry?

Before we get to this answer, let’s first have a look what one of his companies is currently “building”. You might have heard of SpaceX. Yes – these are the folks being capable of shooting rockets to the orbit and landing them again. This significantly reduces the cost per launch. Even Nasa is relying on SpaceX. And it doesn’t stop here: Elon Musk is telling us how to get to the moon (again) and even bring first people to the Mars. This is really visionary, isn’t it?

However, with all this Moon and Mars things, there is one thing we tend to oversee: SpaceX is bringing a lot of satellites to the orbit. Some of them are for other companies, but a significant number are for SpaceX itself. They shot some 1,700 satellites to the orbit and are already the largest operator of satellites. But what are these satellites for? Well – you might already guess it: for providing satellite-powered internet. In a first statement, the satellite network was considered for areas with large coverage. However, recently the company (named “StarLink“) announced that they now offer a global coverage.

One global network …

Wait, did I just write “a global coverage”? That’s “insane”. One company can provide internet for each and every person on the planet, regardless of where they are. All 7,9 billion people on the world. That is a huge market to address! However, what is more impressive is the cost at what they can built this network. Right now, they have something like 1,700 satellites out there. Each Falcon 9 rocket (which they own!) can transport around 40 of these satellites. All together, the per-satellite cost for SpaceX would be around 300,000$. According to Morgan Stanley, SpaceX might need well below 60 billion dollars to built a satellite internet of around 30,000 satellites. This is a way higher number than the 1,700 already up there. However, think about speed and latency – right now, with 1,700 satellites already out, StarLink is offering around 300 Mbits with 20ms latency. This is already great as compared to 4G, where you merely get up to 150 Mbits. Curious what’s in if all 30k are out? I would expect that we get some 1Gbit and a very low latency. Then it would be a strong competitor to 5G.

Again, the cost …

Morgan Stanley estimated the cost for this network to be around 60 billion USD. This is quite a lot of money StarLink has to gather. This sounds like a lot, but it isn’t. Let’s compare it again to 5G. Accenture estimates that the 5G network for the United States will cost some 275 billion alone! One market. compare the 60 billion of Starlink – a global market addressing 7.9 billion people – with the U.S., where you can address 328 million people. It is 20 times the market, by a fraction of the cost! Good night, 5G.

Internet of things via satellites rather than 5G

Building up 5G might not succeed in the race for the future of IoT applications. Just think about autonomous cars: one key issue there is a steady connectivity. 5G might not be everywhere or connectivity might be bad in a lot of regions that have a smaller population. It simply doesn’t pay out for TelCos to built 5G everywhere. But in contrast – StarLink will be everywhere. So large IoT applications will rather go for Starlink. Imagine Ford or Mercedes having one partner to negotiate rather than 50 different Telco providers around the globe for their setup. It makes things easier from a technical and commercial point of view.

Are Telcos doomed?

I would say: not yet. Starlink is at a very early stage and still in Beta. There might be some issues coming up. However, Telcos should definitely be afraid. I was in the business until recently, and most Telco executives are not much thinking about Starlink. If they do, they laugh at them. But remember what happend with the automotive industry? Yep, we are all now going electric. Automotive executives were laughing at Tesla. A low-volume, niche player they said. What is it now? Tesla being more valuable than any other automotive company in the world, producing cars in the masses.

However: one thing is different; automotive companies could easily attach to the new normal. Building a car is not just about about the engine. It is also a lot about the process, the assembly lines and alike. All major car manufacturers now offer electric cars and can built them in a competitive manner with Tesla. As of Starlink vs. 5G, this will be different: Telco companies can’t built rockets. Elon Musk will disrupt another industry – again!

This post is an off-topic post from my Big Data tutorials

Shows the code editor in Python

In my previous post, I gave an introduction to Python Libraries for Data Engineering and Data Science. In this post, we will have a first look at NumPy, one of the most important libraries to work with in Python.

NumPy is the simplest library for working with data. It is often re-used by other libraries such as Pandas, so it is necessary to first understand NumPy. The focus of this library is on easy transformations of Vectors, Matrizes and Arrays. It provides a lot of functionality on that. But let’s get our hands dirty with the library and have a look at it!

Before you get started, please make sure to have the Sandbox setup and ready

Getting started with NumPy

First of all, we need to import the library. This works with the following import statement in Python:

import numpy as np

This should now give us access to NumPy libraries. Let us first create an 3-dimensional array with 5 values in it. In NumPy, this works with the “arange” method. We provide “15” as the number of items and then let it re-shape to 3×5:

vals = np.arange(15).reshape(3,5)
vals

This should now give us an output array with 2 dimensions, where each dimension contains 5 values. The values range from 0 to 14:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

NumPy contains a lot of different variables and functions. To have PI, you simply import “pi” from numpy:

from numpy import pi
pi

We can now use PI for further work and calculations in Python.

Simple Calculations with NumPy

Let’s create a new array with 5 values:

vl = np.arange(5)
vl

An easy way to calculate is to calculate something to the power. This works with “**”

nv = vl**2
nv

Now, this should give us the following output:

array([ 0,  1,  4,  9, 16])

The same applies to “3”: if we want to calculate everything in an array to the power of 3:

nn = vl**3
nn

And the output should be similar:

array([ 0,  1,  8, 27, 64])

Working with Random Numbers in NumPy

NumPy contains the function “random” to create random numbers. This method takes the dimensions of the array to fit the numbers into. We use a 3×3 array:

nr = np.random.random((3,3))
nr *= 100
nr

Please note that random returns numbers between 0 and 1, so in order to create higher numbers we need to “stretch” them. We thus multiply by 100. The output should be something like this:

array([[90.30147522,  6.88948191,  6.41853222],
       [82.76187536, 73.37687372,  9.48770728],
       [59.02523947, 84.56571797,  5.05225463]])

Your numbers should be different, since we are working with random numbers in here. We can do this as well with a 3-dimensional array:

n3d = np.random.random((3,3,3))
n3d *= 100
n3d

Also here, your numbers would be different, but the overall “structure” should look like the following:

array([[[89.02863455, 83.83509441, 93.94264059],
        [55.79196044, 79.32574406, 33.06871588],
        [26.11848117, 64.05158411, 94.80789032]],

       [[19.19231999, 63.52128357,  8.10253043],
        [21.35001753, 25.11397256, 74.92458022],
        [35.62544853, 98.17595966, 23.10038137]],

       [[81.56526913,  9.99720992, 79.52580966],
        [38.69294158, 25.9849473 , 85.97255179],
        [38.42338734, 67.53616027, 98.64039687]]])

Other means to work with Numbers in Python

NumPy provides several other options to work with data. There are several aggregation functions available that we can use. Let’s now look for the maximum value in the previously created array:

n3d.max()

In my example this would return 98.6. You would get a different number, since we made it random. Also, it is possible to return the maximum number of a specific axis within an array. We therefore add the keyword “axis” to the “max” function:

n3d.max(axis=1)

This would now return the maximum number for each of the axis within the array. In my example, the results look like this:

array([[93.94264059, 79.32574406, 94.80789032],
       [63.52128357, 74.92458022, 98.17595966],
       [81.56526913, 85.97255179, 98.64039687]])

Another option is to create the sum. We can do this by the entire array, or by providing the axis keyword:

n3d.sum(axis=1)

In the next sample, we make the data look more pretty. This can be done by rounding the numbers to 2 digits:

n3d.round(2)

Iterating arrays in Python

Often, it is necessary to iterate over items. In NumPy, this can be achieved by using the built-in iterator. We get it by the function “nditer”. This function needs the array to iterate over and then we can include it in a for-each loop:

or val in np.nditer(n3d):
    print(val)

The above sample would iterate over all values in the array and then prints the values. If we want to modify the items within the array, we need to set the flag “op_flags” to “readwrite”. This enables us to do modifications to the array while iterating it. In the next sample, we iterate over each item and then create the modulo of 3 from it:

n3d = n3d.round(0)

with np.nditer(n3d, op_flags=['readwrite']) as iter:
    for i in iter:
        i[...] = i%3
        
n3d

These are the basics of NumPy. In our next tutorial, we will have a look at Pandas: a very powerful dataframe library.

If you liked this post, you might consider the tutorial about Python itself. This gives you a great insight into the Python language for Spark itself. If you want to know more about Python, you should consider visiting the official page.

In some of my previous posts, I shared my thoughts on the data mesh architecture. The Data Mesh was originally introduced by Zhamak Dehghani in 2019 and is now enjoying huge popularity in the community. As one of the main thoughts of the data mesh architecture is the distributed nature of data, it also leads to a domain driven design of the data itself. A data circle enables this design.

What is a data circle?

A data circle is a data model, that is tailored to the use-case domain. It should follow the approach of the architectural quant from the micro service architecture. The domain model should only contain all relevant information for the purpose it is built for and not contain any additional data. Also, each circle could or should run within its own environment (e.g. database). The technology should be selected for the best use of the data. A circle might easily be confused with a data mart, that is built within the data warehouse. However, several data circles might not “live” within one (physical) data warehouse but use different technologies and are highly distributed.

Each company will have several data circles in place, each tailored to the specific needs of use-cases. When modelling data with data circles, unnecessary information will be skipped as it will – at some point – be connectable with other data circles in the company. Have we previously built our data models in a very comprehensive way (e.g. via the data warehouse), we now built the data models in a distributed way.

Samples in the telco and financial industry

If we take for example a telco company, data circles might be:

  • The customer data circle: containing the most important customer data
  • The network data circle: containing information about the network
  • The CDR data circle: containing information about calls conducted

If we look at the insurance industry, data circles might be:

  • The customer data circle: containing the most important customer data
  • The claims data circle: containing the data about past claims
  • The health data: containing the data about health related infos

If we focus back to the telco company, the data about the customer might be stored in a relational model within a RDBMS. However, network data might be stored in a graph for better spatial analysis. CDR data might be stored in a files-based setup. For each domain, the best technology is selected and the best model is designed. Similar holds true for other industries.

Several data circles make up the design

Different business units will built their own data circles to fit to their demands. This, however, makes it necessary to create a central repository that sticks it all together: a hub connecting all the circles. The hub stores information about connectivity of different circles. Imagine the network data model again – you might want to connect the the network data with customer data. There must be a way to connect this data, by still keeping its distributed aspects. The hub serves as a central data asset management tool and one-stop-shop for employees within the company to find the data they need.

A data hub connecting different data circles
Circles connected via a hub

The Data Hub also allows users to connect and analyse the data they want to access. This allows the users to use tools such as Jupyter to analyse the data. The hub also takes care about the connectivity to the data and thus provides an API for all users. A data hub is all about data governance.

What’s next?

I recommend you reading about all the other articles I’ve written about the data mesh architecture. It is fairly easy to get stated with this architectural style and the data circles contribute to this.