In this blog post, we will try to have a better understanding in current downloading scheme with AWS S3 Client or HttpClient by looking into the strace of the downloading client and see how can we improve on those.
We already know from first blog post the essential system calls which are necessary for the download to happen.
Now lets see the system calls involved in the download calls issued using each of the clients:
AWS S3 Client STrace for a 100 KB Payload Size
% time | seconds | usecs/call | calls | syscall |
55.32 | 0.028953 | 7 | 4023 | poll |
26.85 | 0.01405 | 59 | 239 | futex |
11.41 | 0.00597 | 0 | 16018 | gettimeofday |
3.37 | 0.001765 | 0 | 3927 | recvfrom |
In this we can clearly see the system calls used by AWS S3 Client for downloading data. Conclusions
- Most of the time is being spent in Non Blocking IO ( combination of
poll
andrecvfrom
system calls ) - High Usage of
gettimeofday
which are supposed to be slow on AWS Platforms - Some System Time is also being consumed by the
futex
system calls which in turn mostly means that time is the time spent in waiting for a child thread to complete ( as we are mostly tracing a single thread or a process in linux terms ). - Using S3 Client
- CPU User = ~7%
- CPU System = ~2.5%
HttpClient STrace for a 100 KB Payload Size
% time | seconds | usecs/call | calls | syscall |
76.12 | 0.162679 | 1 | 239703 | read |
9.19 | 0.019631 | 5 | 4089 | recvfrom |
8.35 | 0.017847 | 45 | 399 | futex |
4.54 | 0.009708 | 98 | 99 | connect |
1.4 | 0.002996 | 0 | 12232 | gettimeofday |
In this we can see that why HttpClient Strace is less performant than AWS S3 client. This is because of the system call involving read()
which clearly is the most critical section considering the system time of this download.
Conclusions:
- Most of the system time is being spent in read system calls.
- Other system calls involved in the process of downloading are the typical system calls which we expect during downloading process ( aka recvfrom , connect ).
- Multiple connects system calls are because of the fact that we were making many downloading calls to the HttpClient.
- Using Http Client
- CPU User = ~7%
- CPU System = ~3%
Note: We can definitely remove the need for these read system calls with some tweaking with the HttpClient settings. AWS S3 client in itself uses HttpClient for communication with the S3 Service.
Simple Client STrace for a 100 KB Payload Size
Now that we understand what are the basic things required for downloading data, lets write our own client to download data from S3.
We would be using this simple piece of code for downloading data from S3 via a URI.
Socket s = new Socket(host, 80); PrintWriter wtr = new PrintWriter(s.getOutputStream()); wtr.println("GET "+ url +" HTTP/1.1\r\nHost: " + host + "\r\n"); wtr.flush(); InputStream inputStream = s.getInputStream(); int payloadSize = readHeaders(inputStream); int read = 0; int ret = 0; byte[] buffer = new byte[size]; while (read != payloadSize){ ret = inputStream.read(buffer, 0, buffer.length); if (ret == -1) return; read += ret; } wtr.close();
Lets dig into the strace for this code while downloading data.
% time | seconds | usecs/call | calls | syscall |
95.06 | 0.026071 | 1 | 37609 | recvfrom |
1.36 | 0.000372 | 4 | 98 | connect |
1.11 | 0.000305 | 1 | 431 | gettimeofday |
0.57 | 0.000157 | 2 | 98 | dup2 |
0.35 | 0.000095 | 1 | 97 | sendto |
Conclusions:
- In this particular client implementation we can clearly see that most of the system time is being spent in recvfrom system calls which we know means that it is being spent for copying data from kernel space to user space or waiting for data to arrive in those kernel buffers.
- CPU Consumption seems close to the AWS S3 Client CPU Consumption.
- Using Simple Client
- CPU User = ~7%
- CPU System = ~2.5%
Now lets compare the speed of download of our new simple client with AWS S3 Client or HttpClient for 100 KB Payload Size.
S3 Client | HttpClient | Simple Client | |
10th Pct | 538 | 631 | 390 |
20th Pct | 561 | 693 | 399 |
30th Pct | 591 | 711 | 417 |
40th Pct | 598 | 763 | 435 |
50th Pct | 605 | 772 | 446 |
60th Pct | 610 | 778 | 457 |
70th Pct | 643 | 786 | 461 |
80th Pct | 671 | 791 | 486 |
90th Pct | 702 | 805 | 498 |
Conclusions:
- Simple Client Outperforms S3 Client by around 30%
- Simple Client Outperforms HttpClient by around ~45%
Now lets compare these numbers for every client with a big payload size i.e. 50 MB and compare the CPU consumptions for every client
S3 Client | HttpClient | Simple Client | |
10th Pct | 4081 | 3742 | 3463 |
20th Pct | 4207 | 4035 | 3845 |
30th Pct | 4405 | 4107 | 4105 |
40th Pct | 4699 | 4352 | 4412 |
50th Pct | 6098 | 4634 | 4748 |
60th Pct | 6457 | 6288 | 5407 |
70th Pct | 7458 | 7074 | 5647 |
80th Pct | 7911 | 9079 | 5847 |
90th Pct | 11491 | 11531 | 6022 |
Conclusions:
- Simple Client outperforms S3 Client by around 15% for low percentiles but for higher percentiles the performance is even better with simple client around 20%.
- Simple Client outperforms HttpClient by around 10% for lower percentiles and for higher percentiles it is around 15%.
- CPU Consumption is also lower for the Simple Client still lingering around 7.5% to 8% when compared to S3Clients or HttpClients whose CPU consumption lingers around 12.5% to 15%. So thats a win and win situation.
Note: Simple client performance is better with lesser consumption in resources but in the current implementation or experiments we are using this client over http connection which means that we need to be really cognisant when do we want to use this client.
2 thoughts on “Understanding Linux Internals for Data Transfer – Part 3”