Sunday, August 16, 2015

Linux Socket Programming in C++

This tutorial will show you how to send and receive data over a TCP/IP (ipv4 or ipv6) network using Linux and C++.

Contents

1. Before you read this tutorial. 
2. What is a (internet)socket?
3. Before we create a socket. 
4. Connecting to a server. 
5. Creating your own server. 

I. Before you read this tutorial.

This tutorial focuses on socket programming using C++, Linux and the Berkely sockets API . This library is originally written for C, but in this tutorial I will use C++ where possible. If you prefer pure C++ you can take a look at theBOOST.asio (cross platform) library (which is in fact, on linux systems, a C++ wrapper for the C Berkeley sockets API). If you don't know much about TCP/IP networking I would advise you to read a little about that subject first, because I will not explain it here. I've tried to keep this tutorial short and simple. For more information you could read the MAN pages of the system calls and functions we use in this tutorial. When you've read this tutorial you should be able to create a client and a server. For compiling I used the GCC. 

II. What is a (internet)socket?

An internet socket is a begin or endpoint in a tcp/ip network connection. You can see an internet socket as the entrance or exit of a tunnel. A socket translates all incoming data in to "human readable text" or translates all outgoing data into networks packets.

The purpose of sockets is simplifying the network code of your c++ application. All the user has to do is create a socket and connect it. Once your socket is connected the socket takes care of the networking part, and all you have to worry about now is sending data to, or receiving data from, the socket.

sockets drawing

A socket can be a server or a client. Before a socket can be connected to another socket and "create a tunnel" you have to specify what type of socket it will be and where it should connect to.

How to communicate with a socket? 

When you have created a socket, and it is connected to another (remote)socket, communication with the sockets is easy. Communicating with a socket is done using unix file descriptors (file handlers) and its associated functions likewrite() and read() (for internet sockets send() and recv() are preferred). If you never worked with files in Linux in that way, don't worry. You just need to remember that when you create a socket it returns a socket descriptor. This socket descriptor is just an integer. You can see it as the "name" of, or reference to the socket. You will need this socket descriptor if you want to interact with the socket. For example the system calls send() and recv() require a socket descriptor so they know what socket to talk with.

Socket flow 

III. Before we create a socket.

Before we start creating a socket, we should know what we want to do with it (will it be a client or a server?). In this tutorial I have decided we are going to build a tcp/ip client that connects to google.com and retrieve's it's homepage. Therefore we need to provide our socket with the address (google.com), the address type (ipv4), the port (webservers usually run on port 80) and the connection type (tcp/ip).

For this purpose, some time ago, some people invented a couple of data structures. The programmer had to fill these data structures with the appropriate data so the sockets could work with it. Unfortunately these structs are a bit complicated and putting data into them isn't as easy as you would expect.

After a while some smart people came along and invented a new function called getaddrinfo(). This function deals with all the complicated structs. It also uses a new struct called called addrinfo. Lucky for us this function and the struct addrinfo are all we need to create and connect a socket.
There is one other cool thing to report about this function: It's IP agnostic. This means you don't have to specify whether you want to connect to a IPv4 or IPv6 host (unlike the older structs).
This does NOT mean the other structs are useless. The other structs are still used and can be very useful if you want to do more (complicated) things with your socket code.

Show the struct addrinfo >>>. 

To keep things simple just remember that we need getaddrinfo() to fill the struct addrinfo with data we need for creating and connecting our socket.

But... there is one thing you need to remember. Sometimes a host (node) translates to multiple IP addresses. Go to your terminal and type host google.com. You will see that google.com has multiple IP addresses. Some hosts have both an IPv4 and an IPv6 adress. For this reason getaddrinfo() will not just fill one addrinfo struct, it will create a struct for every address address found. These structs will be put unto a linked list, so you can browse through them easily.

Don't worry if that sounded complicated. In our example we don't bother with the other IP addresses found, we just use the first struct (and this should usually do the trick).

The function prototype looks like this :

int getaddrinfo(const char *node, const char *service,
const struct addrinfo *host_info, struct addrinfo **res); 


The parameters:

node : The host you want to connect to. This can be a hostname or IP address.
service : This is the port number you want to connect to. Usually an integer, but can also be a known service name like 'http'.
host_info : Points to the addrinfo struct to fill.
res : Points to the linked list of filled addrinfo structs.
return value : The function returns 0 if all succeeded or a non-zerro error code in case of an error.


In our code we want to connect to "google.com" at port 80. Therefore our getaddrinfo() call looks like this:

 
#include <iostream>
#include <cstring>      // Needed for memset
#include <sys/socket.h> // Needed for the socket functions
#include <netdb.h>      // Needed for the socket functions

int main()
{

  int status;
  struct addrinfo host_info;       // The struct that getaddrinfo() fills up with data.
  struct addrinfo *host_info_list; // Pointer to the to the linked list of host_info's.

  // The MAN page of getaddrinfo() states "All  the other fields in the structure pointed
  // to by hints must contain either 0 or a null pointer, as appropriate." When a struct 
  // is created in C++, it will be given a block of memory. This memory is not necessary
  // empty. Therefor we use the memset function to make sure all fields are NULL.     
  memset(&host_info, 0, sizeof host_info);

  std::cout << "Setting up the structs..."  << std::endl;

  host_info.ai_family = AF_UNSPEC;     // IP version not specified. Can be both.
  host_info.ai_socktype = SOCK_STREAM; // Use SOCK_STREAM for TCP or SOCK_DGRAM for UDP.

  // Now fill up the linked list of host_info structs with google's address information.
  status = getaddrinfo("www.google.com", "80", &host_info, &host_info_list);
  // getaddrinfo returns 0 on succes, or some other value when an error occured.
  // (translated into human readable text by the gai_gai_strerror function).
  if (status != 0)  std::cout << "getaddrinfo error" << gai_strerror(status) ;
    
}    

After using getaddrinfo() probably a lot of structs where filled with interesting information. To keep it simple we will not try to get it out of there. All we are going to do is get the nessesary information out of the first struct host_info(the first address we found with getaddrinfo()) in the linked list of structs.

Enough of this,now let's do some connecting! 

IV. Connecting to a server.

Now we are ready to create a socket. To create a socket we use the socket() system call :

int socket(int domain, int type, int protocol);

The parameters:

domain : The domain argument specifies a communication domain. In our case this value is AF_INET or AF_INET6 (the internet using ip4 or ip6)
type : The type of socket. In our case it is SOCK_STREAM (tcp)
protocol : The protocol to be used with the socket-type. In our case the right protocol is automatically choosen.
return value : The socket system call returns a socket descriptor. If the socket call fails, it returns -1.

If you want to translate this error number into a error description string #include <errno.h> and use thestrerror(errno) function. The same is true for all the error numbers in the system calls below.

Our socket call :

std::cout << "Creating a socket..."  << std::endl;
int socketfd ; // The socket descripter
socketfd = socket(host_info_list->ai_family, host_info_list->ai_socktype, 
host_info_list->ai_protocol);
if (socketfd == -1)  std::cout << "socket error " ;
As you can see above, we use the values stored in the first struct host_info of the linked list getaddrinfo() we created. Note: there might be more addresses in the linked list but we just keep it simple and use the first one.

We are now ready to connect to google.com. For that we use the connect() system call:

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The parameters:

sockfd : the socket descriptor the socket() call returns.
addr : The address we need to connect to. In our case stored in 'host_info_list->ai_addr'.
addrlen : The addrlen argument specifies the size of addr. In our case stored in 'host_info_list->ai_addrlen'.
return value : If the connection succeeds, zero is returned. On error, -1 is returned, and errno is set appropriately.

Resulting in our connection call looking like this :

std::cout << "Connect()ing..."  << std::endl;
status = connect(socketfd, host_info_list->ai_addr, host_info_list->ai_addrlen);
if (status == -1)  std::cout << "connect error" ;

If we didn't get an error by now we should be connected to google.com at port 80. The next thing we would like to do is send and receive some data!

Sending and recieving data. 

For sending an recieving data we use the send() and recv() system calls. The send() call :

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

The parameters:

sockfd : The socket descriptor the socket() call returns.
buf : The message we want to send.
len : The lenght of this message. Equals strlen(buf).
flags : Without this parameter, this call would be the same as the write() call. This parameter gives you some extra options. Read the send() MAN page for more information.
return value : On success, this call returns the number of characters sent. On error, -1 is returned, and errno is set appropriately. If you want to know if your message was send, this value should be the same as strlen(buf)

In our case we want to get the homepage of google.com, so we will pretent to be a browser (in fact, we are, just a very simple one), so we comply to the HTTP protocol and request the homepage. We add the following code :

std::cout << "send()ing message..."  << std::endl;
char *msg = "GET / HTTP/1.1\nhost: www.google.com\n\n";
int len;
ssize_t bytes_sent;
len = strlen(msg);
bytes_sent = send(socketfd, msg, len, 0);

The last system call we have to use here is recv(). If the other socket (in this case google's server) sends data back to us, our socket will store this in a buffer. This buffer can be read using the recv() call.

The recv() system call :

ssize_t recv(int sockfd, void *buf, size_t len, int flags); 

The parameters:

sockfd : The socket descriptor the socket() call returns.
buf : The variable we want to store the data in (input buffer).
len : The amout of data it will read from the input buffer.
flags : Read the recv() MAN page for more information.
return value : This calls returns the number of bytes received, or -1 if an error occurred. The return value will be 0 when the peer has performed an orderly shutdown.
If no messages are available at the socket, the recv() call halts execution of your code and waits for a message to arrive. This behaviour is called "blocking". By default a socket is in blocking mode. A socket can also be in "non-blocking" mode so it will just continue your code if there is no data in the buffer.

One of the last pieces of code we need is:


std::cout << "Waiting to recieve data..."  << std::endl;
ssize_t bytes_recieved;
char incoming_data_buffer[1000];
bytes_recieved = recv(socketfd, incoming_data_buffer,1000, 0);
// If no data arrives, the program will just wait here until some data arrives.
if (bytes_recieved == 0) std::cout << "host shut down." << std::endl ;
if (bytes_recieved == -1)std::cout << "recieve error!" << std::endl ;
std::cout << bytes_recieved << " bytes recieved :" << std::endl ;
std::cout << incoming_data_buffer << std::endl;

recv() does NOT know when google is done sending data, so we just have to create our own method. This example only reads the first 1000 bytes bytes send by google. When we want to read the rest, we could make a loop. The problem with this solution would be that at some point there is no more data in the buffer, and recv() keeps waiting for ever. The solution : Non-blocking sockets or multithreading.

Other system calls you should know about. 

A function asociated with getaddrinfo() is freeaddrinfo(). You should use this function if you no longer need the linked list of addrinfo structs. This function frees the memory used by the linked list. When you're done using your socket, you can close it using the close() system call. Finally we add to our code :

std::cout << "Receiving complete. Closing socket..." << std::endl;
freeaddrinfo(host_info_list);
close(socketfd);


Download code

The result is a program that connects to google and downloads (the first 1000 bytes of) the google homepage. You can download it HERE .
Compile it on linux with gcc : g++ tcpclient.cpp -o client 

V. Creating your own server.

Creating a server is almost similar as creating a client. The difference is that we replace the connect() call by 3 other calls : bind()listen() and accept().

The getaddrinfo() function also needs different parameters, because now we want our socket to be a server on our own host using our own port.
What we do is simply change "google.com" to NULL (NULL will automatically use your localhost) and "80" to 5555 (the port number we want to listen on).
In addition we add one extra line to specify that we want to accept connections on any of the addresses of the local host.
Note: Unless you are running your code as root, do NOT use a port number lower then 1024, because port numbers lower then 1024 are usually reserved for your OS.


host_info.ai_flags = AI_PASSIVE; 
status = getaddrinfo(NULL, "5555", &host_info, &host_info_list);

Now we create a socket just like we did in our client code...

Instead of connect() we use bind() to bind the socket to the local port we specified. We use this call so Linux will know that is has to forward an incoming packet on the specified port to your program's socket descriptor.

The bind() call :

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen); 

The parameters:

sockfd : The socket descriptor the socket() call returns.
addr : The address we want to listen on (localhost).
addrlen : The lenght of this address.
Return value : Like all the other calls it also returns an integer. If it's '0' the call succeeded, if it's -1, we got an error that will be stored in errno as usual.
Our bind() call ends up looking like this :

std::cout << "Binding socket..."  << std::endl;
// we make use of the setsockopt() function to make sure the port is not in use.
// by a previous execution of our code. (see man page for more information)
int yes = 1;
status = setsockopt(socketfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int));
status = bind(socketfd, host_info_list->ai_addr, host_info_list->ai_addrlen);
if (status == -1)  std::cout << "bind error" << std::endl ;


The bind call doesn't make your server listen for incoming connections yet. Why not? Because we can also use thebind() call in our client code. Say for example we wanted our client program to use localport 9999 , we would place a bind call before the connect() call. If you leave it out, your OS will choose any available port.

What next?

What we need to do next is tell Linux we are actually listening for incoming connections on that port. We do this using the listen() call:

The listen() call:

int listen(int sockfd, int backlog); 

The parameters:

sockfd : The socket descriptor the socket() call returns.
backlog : Our server can only handle 1 client at a time. What if more clients want to connect to your server at the same time? With backlog you can specify how many connections will be put in que. For example, if you set it to 5, and 7 connections to your server are made, 1 will fail, 1 will connect and the other 5 will be put "on hold".
Return value : As usual it returns -1 on error and 0 on success.

Our listen call :

std::cout << "Listen()ing for connections..."  << std::endl;
status =  listen(socketfd, 5);
if (status == -1)  std::cout << "listen error" << std::endl ;

Now some client decides to connect to your server, what do we do?
We accept() the client. But this call does something special. It returns a new socket descriptor. Why?
Because we are a server and we want to serve as many people as possible, right?
Therefore accept() creates a new socket for each client that connects. This way we can talk to to our client on a "private" socket and keep our old socket listening for new visitors.

The accept() call :

new_fd = accept(sockfd, (struct sockaddr *)&their_addr, &addr_size);

Parameters :

sockfd : The socket descriptor the socket() call returns.
their_addr : their_addr will usually be a pointer to a local struct sockaddr_storage. This is where the information about the incoming connection will be stored (Like the client's IP address and port).
addr_size : addr_size is the size of the their_addr struct.

return value : A brand new socket descriptor on success! Or -1 on error.

our accept call :

std::cout << "Listen()ing for connections..."  << std::endl;
status =  listen(socketfd, 5);
if (status == -1)  std::cout << "listen error" << std::endl ;


At this time the connection should be established and we are ready to talk using our new socket descriptor. Like in the client, we do this with the send() and recv() calls. But DO NOT FORGET to use the new socket descriptor in thesend() and recv() calls returned by accept()!

Finally we use the freeaddrinfo() and close() functions because we like clean and proper code.

Download 

Download the complete server example HERE .
Testing : To test your server you can just telnet into it (telnet 127.0.0.1 5555) or just use the client we created before (just change "google.com" to "127.0.0.1" and "80" to "5555").

What about serving multiple clients at the same time like webservers do? 

You can. However this is a little bit more complicated. There are multiple methods for this, one of the most popular ones is multithreading. You can read about multithreading in my Multithreading tutorial, or you can take a look at myMultithreaded chess/chat server

ref:http://codebase.eu/tutorial/linux-socket-programming-c/

No comments:

Post a Comment