Understanding Linux Internals for Data Transfer – Part 3

In this blog post, we will try to have a better understanding in current downloading scheme with AWS S3 Client or HttpClient by looking into the strace of the downloading client and see how can we improve on those.

We already know from first blog post the essential system calls which are necessary for the download to happen.

Now lets see the system calls involved in the download calls issued using each of the clients:

AWS S3 Client STrace for a 100 KB Payload Size

% time seconds usecs/call calls syscall
55.32 0.028953 7 4023 poll
26.85 0.01405 59 239 futex
11.41 0.00597 0 16018 gettimeofday
3.37 0.001765 0 3927 recvfrom

100k_1_s3_all-cpu

In this we can clearly see the system calls used by AWS S3 Client for downloading data. Conclusions

  • Most of the time is being spent in Non Blocking IO ( combination of poll and recvfrom system calls )
  • High Usage of gettimeofday which are supposed to be slow on AWS Platforms
  • Some System Time is also being consumed  by the futex system calls which in turn mostly means that time is the time spent in waiting for a child thread to complete ( as we are mostly tracing a single thread or a process in linux terms ).
  • Using S3 Client
    • CPU User = ~7% 
    • CPU System = ~2.5%

HttpClient STrace for a 100 KB Payload Size

% time seconds usecs/call calls syscall
76.12 0.162679 1 239703 read
9.19 0.019631 5 4089 recvfrom
8.35 0.017847 45 399 futex
4.54 0.009708 98 99 connect
1.4 0.002996 0 12232 gettimeofday

100k_1_http_all-cpu

In this we can see that why HttpClient Strace is less performant than AWS S3 client. This is because of the system call involving read() which clearly is the most critical section considering the system time of this download.

Conclusions:

  • Most of the system time is being spent in read system calls.
  • Other system calls involved in the process of downloading are the typical system calls which we expect during downloading process ( aka recvfrom , connect ).
  • Multiple​ connects system calls are because of the fact that we were making many downloading calls to the HttpClient.
  • Using Http Client
    • CPU User = ~7% 
    • CPU System = ~3%

Note: We can definitely remove the need for these read system calls with some tweaking with the HttpClient settings. AWS S3 client in itself uses HttpClient for communication with the S3 Service.

 

Simple Client STrace for a 100 KB Payload Size

Now that we understand what are the basic things required for downloading data, lets write our own client to download data from S3.

We would be using this simple piece of code for downloading data from S3 via a URI.

Socket s = new Socket(host, 80);
PrintWriter wtr = new PrintWriter(s.getOutputStream());
wtr.println("GET "+ url +" HTTP/1.1\r\nHost: " + host + "\r\n");
wtr.flush();

InputStream inputStream = s.getInputStream();
int payloadSize = readHeaders(inputStream);
int read = 0;
int ret = 0;
byte[] buffer = new byte[size];
while (read != payloadSize){
    ret = inputStream.read(buffer, 0, buffer.length);
    if (ret == -1) return;
    read += ret;
}
wtr.close();

Lets dig into the strace for this code while downloading data.

% time seconds usecs/call calls syscall
95.06 0.026071 1 37609 recvfrom
1.36 0.000372 4 98 connect
1.11 0.000305 1 431 gettimeofday
0.57 0.000157 2 98 dup2
0.35 0.000095 1 97 sendto

100k_3_simple_all-cpu

 

Conclusions:

  • In this particular client implementation we can clearly see that most of the system time is being spent in recvfrom system calls which we know means that it is being spent for copying data from kernel space to user space or waiting for data to arrive in those kernel buffers.
  • CPU Consumption seems close to the AWS S3 Client CPU Consumption.
  • Using Simple Client
    • CPU User = ~7% 
    • CPU System = ~2.5%


Now lets compare the speed of download of our new simple client with AWS S3 Client or HttpClient for 100 KB Payload Size.

S3 Client HttpClient Simple Client
10th Pct 538 631 390
20th Pct 561 693 399
30th Pct 591 711 417
40th Pct 598 763 435
50th Pct 605 772 446
60th Pct 610 778 457
70th Pct 643 786 461
80th Pct 671 791 486
90th Pct 702 805 498

Conclusions:

  • Simple Client Outperforms S3 Client by around 30%
  • Simple Client Outperforms HttpClient by around ~45%


Now lets compare these numbers for every client with a big payload size i.e. 50 MB and compare the CPU consumptions for every client

S3 Client HttpClient Simple Client
10th Pct 4081 3742 3463
20th Pct 4207 4035 3845
30th Pct 4405 4107 4105
40th Pct 4699 4352 4412
50th Pct 6098 4634 4748
60th Pct 6457 6288 5407
70th Pct 7458 7074 5647
80th Pct 7911 9079 5847
90th Pct 11491 11531 6022

clients_50m

Conclusions:

  • Simple Client outperforms S3 Client by around 15% for low percentiles but for higher percentiles the performance is even better with simple client around 20%.
  • Simple Client outperforms HttpClient by around 10% for lower percentiles and for higher percentiles it is around 15%.
  • CPU Consumption is also lower for the Simple Client still lingering around 7.5% to 8% when compared to S3Clients or HttpClients whose CPU consumption lingers around 12.5% to 15%. So thats a win and win situation.


Note:
Simple client performance is better with lesser consumption in resources but in the current implementation or experiments we are using this client over http connection which means that we need to be really cognisant when do we want to use this client.

2 thoughts on “Understanding Linux Internals for Data Transfer – Part 3

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.