Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffwd will block when there are multiple client threads #2

Open
a9QrX3Lu opened this issue Sep 11, 2022 · 10 comments
Open

ffwd will block when there are multiple client threads #2

a9QrX3Lu opened this issue Sep 11, 2022 · 10 comments

Comments

@a9QrX3Lu
Copy link

a9QrX3Lu commented Sep 11, 2022

I'm trying to run ffwd on several machines of mine, but find out that more than two clients will cause blocking on FFWD_EXEC. After some debugging, I find out that when there is concurrent FFWD_EXEC, all client threads will block on waiting the server's response, while server cannot receive any client's requests.

$ ./ffwd_sample -t 1 -s 2 -d 100 # this will run to completion
1 0.100 0.013
$ ./ffwd_sample -t 2 -s 2 -d 100 # this will block
@jeriksson
Copy link
Contributor

jeriksson commented Sep 12, 2022 via email

@a9QrX3Lu
Copy link
Author

a9QrX3Lu commented Sep 14, 2022

Hi Jakob,

This is the environment of my machine

  • Intel(R) Xeon(R) Gold 6238R CPU
  • 56 cores
  • 377G DRAM

I think I didn't "oversubscribing" the cores, because I only assign 2 servers and 2 clients in the following test, which is blocking forever. Blocking keeps when I increase the server number or client number.

./ffwd_sample -t 2 -s 2 -d 100 # Here, `-t` means the thread number of clients, and `-s` means the number of polling servers.

What I've tried so far:

  • I've tried on several different Xeon machines, which all results in blocking.
  • ffwd-memcached and ffwd-hashtable also has the same results like the above ffwd_sample test.
  • Tests will only work when I limit the client thread number (-t) to be one.

@jeriksson
Copy link
Contributor

jeriksson commented Sep 14, 2022 via email

@a9QrX3Lu
Copy link
Author

a9QrX3Lu commented Sep 14, 2022

Have a look at htop when the program is running. The program should be using 4 cores 100%, all green (user space). Is that what you see?

Yes. Following is the output of htop. Every CPU below is 100%, all green. But, CPU 1 contains two ffwd_sample threads, each of which owns 50% CPU.

CPU     PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
105 3995483 wangzl     20   0  435M  2408  1956 R 100.  0.0  0:53.43 ./ffwd_sample -t 2 -s 2 -d 100
 78 3995484 wangzl     20   0  435M  2408  1956 R 100.  0.0  0:53.44 ./ffwd_sample -t 2 -s 2 -d 100
 57 3995482 wangzl     20   0  435M  2408  1956 R 100.  0.0  0:53.44 ./ffwd_sample -t 2 -s 2 -d 100
  1 3995481 wangzl     20   0  435M  2408  1956 R 50.0  0.0  0:26.75 ./ffwd_sample -t 2 -s 2 -d 100
  1 3995486 wangzl     20   0  435M  2408  1956 R 50.0  0.0  0:26.68 ./ffwd_sample -t 2 -s 2 -d 100

@a9QrX3Lu
Copy link
Author

I've added two printf to gather logs on concurrent FFWD_EXEC. Hope this can provide some hints.

Two printf:

103 #define FFWD_EXEC(server_no, function, ret, ...) \
 +    printf("context=%p server_no=%d\n", context, server_no);\
105   context->request[server_no]->fptr = function; \
106   prepare_request(context->request[server_no], __VA_ARGS__); \
107   context->local_client_flag[server_no] ^= context->mask; \
108   context->request[server_no]->flag = context->local_client_flag[server_no]; \
109   while(((context->server_response[server_no]->flags ^ context->local_client_flag[server_no]) & context->mask)){ \
110     __asm__ __volatile__("rep;nop": : :"memory"); \
111   } \
 +    printf("get_value\n");\
113   ret = context->server_response[server_no]->return_values[((context->id_in_chip)) % NCLIENTS]; \
114
115 #define GET_CONTEXT() \
116   struct ffwd_context *context = ffwd_get_context();

Runtime log:

context=0x565401c0f860 server_no=1
get_value
...
context=0x565401c0f860 server_no=1
get_value
context=0x565401c0f860 server_no=0
get_value
context=0x565401c0f860 server_no=1
get_value
context=0x565401c0f7d0 server_no=0
context=0x565401c0f860 server_no=0
get_value
# Blocking start

It seems that when the second context (i.e., second client thread) starts to send message, both of the client threads will block.

@jeriksson
Copy link
Contributor

jeriksson commented Sep 14, 2022 via email

@a9QrX3Lu
Copy link
Author

Added "mfence"

103 #define FFWD_EXEC(server_no, function, ret, ...) \
 +    printf("context=%p server_no=%d\n", context, server_no);\
105   context->request[server_no]->fptr = function; \
106   prepare_request(context->request[server_no], __VA_ARGS__); \
107   context->local_client_flag[server_no] ^= context->mask; \
108   context->request[server_no]->flag = context->local_client_flag[server_no];
     \
109   while(((context->server_response[server_no]->flags ^ context->local_client
    _flag[server_no]) & context->mask)){ \
 ~      __asm__ __volatile__("rep;nop;mfence": : :"memory"); \
111   } \
 +    printf("get_value\n");\
113   ret = context->server_response[server_no]->return_values[((context->id_in_
    chip)) % NCLIENTS]; \

After recompiling and re-run, behavior seems to be the same

context=0x5588004c7860 server_no=1
get_value
context=0x5588004c7860 server_no=0
get_value
context=0x5588004c7860 server_no=1
get_value
context=0x5588004c7860 server_no=0
get_value
context=0x5588004c7860 server_no=1
get_value
context=0x5588004c7860 server_no=0
context=0x5588004c77d0 server_no=0
get_value

@jeriksson
Copy link
Contributor

jeriksson commented Sep 14, 2022 via email

@a9QrX3Lu
Copy link
Author

a9QrX3Lu commented Sep 14, 2022

context=0x56229556b860 context->id=2 context->id_in_chip=2 server_no=0
get_value
context=0x56229556b860 context->id=2 context->id_in_chip=2 server_no=1
get_value
context=0x56229556b860 context->id=2 context->id_in_chip=2 server_no=0
get_value
context=0x56229556b860 context->id=2 context->id_in_chip=2 server_no=1
get_value
context=0x56229556b860 context->id=2 context->id_in_chip=2 server_no=0
context=0x56229556b7d0 context->id=0 context->id_in_chip=-2 server_no=0
get_value

The second id_in_chip is a negative number. Maybe this is the cause? I'll look into the CPU core configuration part in the ffwd code.

There are two NUMA node and hyperthreading enabled on my machine, so there is 28 physical cores (56 logical cores) on one socket.

@jeriksson
Copy link
Contributor

jeriksson commented Sep 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants