Software Defined Networking in Supercomputing
Foreword
I did some research related to the emerging networking technologies used for resolving supercomputer networking issues when I was in graduate school. I have learned a lot from that experience, and today, I would like to share information on my research with you.
Background - Demand for Supercomputers
Nowadays, scientists in many fields, including microbiology and astrophysics, frequently use powerful computers (supercomputers) to process vast amounts of data to solve computational problems. As their research develops, the amount of data used for their studies grows, and it needs more powerful supercomputers.
The weather forecast is perhaps the best-known application of supercomputers, and everyone knows its significance in our daily lives. Meteorologists use supercomputers to simulate weather and predict changes for us by processing a large amount of data gathered from satellites and other sources. It puts the supercomputers in a situation where they have to be quick, and consequently, the demand for more powerful supercomputers is constantly growing.
Because of this background, many countries invest much in building more powerful supercomputers. The list of the top 500 supercomputers in the world is available at the URL below.
https://www.top500.org/lists/top500/2022/06/
What is a Cluster Computing?
The current leading technology to build supercomputers is Cluster Computing. The primary concept behind the cluster is to use many computers with no high specs, even no powerful than the laptops on our desks, and connect them through a vast network system. As a result, we can achieve higher performance by running all those computers for a single task.
Networking issues in Cluster Computing
As mentioned above, Cluster Computing needs many computers to achieve higher performance. When numerous computers are running a single task, they need to communicate with each other. Because when a computer, one of the many forming a computer cluster, completes its part, it will share the results with other computers to finish the entire task.
When many computers communicate with each other, as you can imagine, it causes much traffic on the network. Thus, network performance becomes a bottleneck for the entire system, and when the underlying network performs poorly, it takes more time to finish the calculation.
There are various approaches to solving this problem, and one of the practical solutions is to use the Infiniband network, which has a much higher performance than the regular network.
Software Defined Networking
In this research, we have tried to accelerate the network performance by using a relatively new network concept in computer networking called Software Defined Networking (SDN). The main idea of Software Defined Networking is to use software to control network traffic.
Network devices do not communicate with each other but forward the traffic to the configured destination. On the other hand, in SDN, there is a controller which can control those network devices and how to forward network traffic. With this concept, we can control network traffic based on the network requirements to provide the best performance.
SDN in Supercomputers
Since one of the biggest problems of Cluster Computing is poor network performance, we have tried to solve this by using SDN. By taking advantage of the detailed information on how a calculation task will affect the network traffic, we can configure network devices via the Network Controller, a management unit in SDN, to provide the best performance.
In most cases, we can tell the potential communication patterns that occur during the task completion, so we can acquire detailed information on how a calculation task will affect the network traffic before it starts. With this information, we can configure network devices to control network traffic to provide the best performance.
Research Result
My research focused on the network latency issues of Message Passing Interface (MPI), a base library used by many high-performance computing applications, including Hadoop, which runs on supercomputers. Here are some of the findings of my research. From the graph, we can conclude that SDN can provide better performance than a traditional network.
Conclusion
The above is just a summary. If you are interested in this topic and want to know more, please, contact me (dashdavaa@brainyx.co). Also, my papers are available online. Here is my first base paper, published in 2013.
https://link.springer.com/chapter/10.1007/978-3-642-54420-0_86