back to all News

Build Your Own VoIP System, Part 1 – The Basics

How to build your own VoIP system, Part 1 – The Basics

This blog post is the first part of a series of posts, which describe how VoIP system works and how the Sipwise Sip:provider Platform enables you to start various VoIP business models.

  • Part 1, which is provided in this post, gives you an introduction to how VoIP works.
  • Part 2 shows how you can set up a secure and self-hosted Skype-like VoIP system for free using the Sip:provider Platform within 30 minutes.
  • Part 3 is dedicated to the Sip:provider Platform acting as an SBC in front of existing VoIP systems.
  • Part 4 describes how you can operate a whole-sale business with the Sip:provider Platform.
  • Part 5 shows how to enable Over-The-Top (OTT) services using Apple and Google Push Notification Services.


VoIP Systems are seen as complex communication infrastructures even from a high-level perspective, but they’re not. Well, VoIP is in fact complex in its details, but it has been abstracted by various projects in order to make it really straight-forward to use it, so it’s easy to start a compelling voice/video communication system or service (which I’ll name “VoIP system” or “VoIP service” throughout the document) from scratch, but it’s important to learn a few facts about it in order to choose the right base system for successfully running a VoIP service.

The Basics

VoIP just means “Voice over IP”, which is a generic term for transporting real-time voice sessions over the Internet. However, it doesn’t define HOW this is done, and even the term “Voice” is a bit misleading, because, with the very same concept, you can transport also Video and Fax over an IP connection.

There are a couple of elements involved when you’re talking about a VoIP system.

To sum it up, there are SIP Endpoints, which are the client instances of your customers. These could be software installed on your customer’s computers (popular software is Jitsi, an open source and cross-platform communications client, or Bria, a commercial multi-platform client for Windows, iOS and Android). Other possibilities are SIP phones like SNOM phones or Polycom Phones.

Beside the customer facing end points, there are SIP gateways which translate VoIP into traditional fixed-net and mobile networks. They pretty much act like customer facing clients, but usually are able to handle multiples of parallel calls. They are usually connected via multiple ISDN E1 or T1 lines, and sometimes an SS7 control layer is used on top.

How does SIP work?

In order to establish a communication session, you need a signalling protocol, which tells the involved parties who wants to communicate with whom, and which media capabilities might be used (e.g. plain voice, voice/video, fax etc.). There are several protocols out there, like Skype (a proprietary protocol) and H.323 (more or less obsolete since 2004) and the most important and nowadays most wide-spread one and the one we’re concentrating here: SIP, the Session Initiation Protocol.

SIP Registrations

A very important part of VoIP is the registration of customer endpoints. It means if a customer starts its SIP client, the client tells the SIP server at which IP and port it is reachable in case there’s a call towards this customer.

The important part, beside the authentication scenario which is an HTTP digest authentication, is the Contact header, which indicates at which IP:port the customer is reachable.

So during start-up, the client tells the server the contact address it’s reachable for subsequent calls.

But what about real Phone Numbers?

Ok, so we learned that can contact if registered up-front (telling the SIP service provider at which IP:port he’s reachable), and vice versa. But what about real phone numbers?

In order to receive calls from the PSTN (public switched telephony network), your SIP service provider needs to map a PSTN number to your SIP URI, e.g. he needs to know that is equivalent to for example +43 1 1001. If somebody calls 4311001 in the PSTN, it’s routed through the telephony network down to your service provider, which holds ownership of that number. The service provider is now responsible to translate the number to a corresponding SIP-URI, and then route the call to the IP:port where this user is registered.

How about a Video Stream?

The important thing here is that any media stream in SIP signalling is negotiated end-to-end. This means that if calls, alice proposes a list of media sessions (e.g. audio with a specific list of codecs, and video with a specific list of other codecs), and bob compares this list with its own capabilities and then replies with a (potential) sub-set of the offer from alice. So if alice proposes an audio and video call, but bob doesn’t have a web-cam, he’ll reply with a sub-set of alice’s offer, which only contains the audio-part. However, if bob has a web-cam, we’ll reply with an according answer telling alice that both audio and video streams are available.


If you want to work with “VoIP”, you most likely will work with the SIP Protocol. SIP will allow you to do two-way, end-to-end communication, but you’ll need SIP clients to attach to a system like this. Do you need do pay for an external service in order to start a VoIP system? No!

What’s next?

The next post will describe how you can use the open source Sipwise Sip:provider CE to build a VoIP system from scratch within an hour. It’ll show how you can create a Skype-like service within your network using IPv4, IPv6, TLS and SRTP.

Follow us on Twitter and Facebook for updates and new posts.