Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Too small buffer size for data transmission and rdb read-write #896

Closed
wants to merge 2 commits into from

Conversation

cyningsun
Copy link
Contributor

@cyningsun cyningsun commented Dec 9, 2024

Bugs

  1. func (r *syncStandaloneReader) receiveRDB() string pass a writer instead of a BufferWriter when call receiveRDBWithoutDiskless
  2. func (r *syncStandaloneReader) receiveRDBWithoutDiskless(marker string, wt io.Writer) use 32MB buf to buffer tcp data before every write to file, and hoping to call Write function once every 32M cycles.
  3. But the default buffer size of bufio.Reader is 4096, so each call of r.rd.Read(buf[:readOnce]) can only return up to 4k data, and then will call Write function and write to File

Fix

  1. use BufferWriter when calling func (r *syncStandaloneReader) receiveRDBWithoutDiskless(marker string, wt io.Writer)
  2. Add reader/writer buffer size options for both tcp for scenarios of data transmission and rdb read-write

@CLAassistant
Copy link

CLAassistant commented Dec 9, 2024

CLA assistant check
All committers have signed the CLA.

@EquentR
Copy link
Collaborator

EquentR commented Dec 10, 2024

I have a few questions:

The RDB reception uses a 4096-byte bufReader. After reviewing the bufio source code, if the length of the []byte passed to the Read method exceeds the buffer size, the actual read operation will use the passed length directly. Therefore, the statement that "if the underlying network returns data exceeding 4096 bytes, it will be read in batches" is inaccurate. Additionally, saying that the length written to the file is only 4096 bytes is also incorrect. Furthermore, file descriptors at the operating system level have their own caching mechanisms. Unless File.Sync is manually called, forced disk writes will not occur, so disk usage is unlikely to be high. Is there really a performance gain with this change?

Adding these configurations to the configuration file will significantly increase the learning and understanding cost for users. Is there a specific need that requires such special configurations?

我有几个疑问:
1、接收RDB使用了4096个字节的bufReader。看了一下bufio的源码,如果传入Read方法的[]byte长度大于buffer的长度,那么会直接使用传入的长度读取,那么这里“如果底层网络返回数据超过4096将分批读取”的说法不准确。那么将其写入到文件的长度只有4096更是不对,而且文件fd在操作系统层面还有缓存,只要不手动执行File.Sync则不会触发强制落盘,对硬盘的占用也不会很高。这样更改真的有性能提升吗?

2、在配置文件中增加这些配置将大幅增加用户的学习理解成本,这里是否有什么一定需要特殊配置的吗?

@cyningsun
Copy link
Contributor Author

hi EquentR, you're correct. I failed to locate the optimization in bufio.Read. Sorry for taking your time

@cyningsun cyningsun closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants