-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove re-slicing for multipart #931
Remove re-slicing for multipart #931
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #931 +/- ##
==========================================
- Coverage 27.97% 26.92% -1.06%
==========================================
Files 87 87
Lines 13590 14129 +539
==========================================
+ Hits 3802 3804 +2
- Misses 9371 9903 +532
- Partials 417 422 +5 ☔ View full report in Codecov by Sentry. |
5814442
to
431c312
Compare
431c312
to
68451d2
Compare
68451d2
to
e0d1565
Compare
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the internal slicer option ever be turned off?
id oid.ID | ||
elements []string | ||
creationTime = TimeNow(ctx) | ||
// User may upload part large maxObjectSize in NeoFS. From users point of view it is a single object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i can understand this comment properly. I've read something about ETag (the only thing this var is used for) but cannot correlate it with this comment. can you please extend it or tell me what a reader should understand here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parts can be reuploaded and they're identified by etags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, just not get how this comments helps and what it wants to say to me
@@ -266,13 +268,20 @@ func (f MsgHandlerFunc) HandleMessage(ctx context.Context, msg *nats.Msg) error | |||
// NewLayer creates an instance of a layer. It checks credentials | |||
// and establishes gRPC connection with the node. | |||
func NewLayer(log *zap.Logger, neoFS NeoFS, config *Config) Client { | |||
buffers := sync.Pool{} | |||
buffers.New = func() any { | |||
b := make([]byte, neoFS.MaxObjectSize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, a potential OOM killer is here, the same way we already fixed this in SDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to changes, please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sync.Pool
showed better memory using profile after load tests. It Already works here
neofs-s3-gw/internal/neofs/neofs.go
Line 331 in 4717bd8
data := x.buffers.Get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to changes, please
Using sync.Pool showed better memory using profile after load tests. It Already works here
yes, but putting 1K objects 1-byte sized simultaneously leads to 64GB memory usage but requires only 1KB in fact. we already know we may be quite successive at running out of 1TB memory, not a cool story
but ok, as i understand we will return to this later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we're to reupload a part?
id oid.ID | ||
elements []string | ||
creationTime = TimeNow(ctx) | ||
// User may upload part large maxObjectSize in NeoFS. From users point of view it is a single object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parts can be reuploaded and they're identified by etags.
api/layer/multipart_upload.go
Outdated
} | ||
|
||
dt := n.buffers.Get() | ||
chunk := dt.(*[]byte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leverage p.Size
here for buffer allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean just in place buffer like b := make([]byte, p.size)?
Buffers already works here
neofs-s3-gw/internal/neofs/neofs.go
Line 331 in 4717bd8
data := x.buffers.Get() |
and after many tests this approach shows better memory using, rather in place buffer
api/layer/multipart_upload.go
Outdated
prm := PrmObjectCreate{ | ||
Container: bktInfo.CID, | ||
Creator: bktInfo.Owner, | ||
Attributes: attributes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an attribute identifying particular part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the situation when we need it. Do you have a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a suggestion, if it can't simplify anything now then OK.
api/layer/multipart_upload.go
Outdated
if len(lastPart.MultipartHash) > 0 { | ||
splitPreviousID = lastPart.OID | ||
|
||
if binaryUnmarshaler, ok := multipartHash.(encoding.BinaryUnmarshaler); ok && len(lastPart.MultipartHash) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be moved to some helper, duplicating.
main commit with reslicing removal contains only issue ref which is good for GH but not enough for git history overall. Lets write some text about motivation and used approach |
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
99144d1
to
55ab89a
Compare
Everything will be broken. We collect full object hash moving from part to part. If you re-upload any part in the middle, we will have to re-read all parts after it, recalculate hashes, and re-upload parts |
c7a2e4e
to
4188fe1
Compare
Cool. We need to do this. And that's where per-part IDs can become handy because
And btw,
|
Added logic for re-upload parts |
477e53a
to
0d215e4
Compare
Closes #843. Before, multipartComplete read all parts of Big object to the memory, combine them and generate final Big object. These step consume time and memory, eventually any system will fail to load all parts in mem or timeout during the process. After, object slicing process works from the first uploaded part. Calculating each part hash and whole object hash during whole process. Storing object hash state to each part metadata in tree service. Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
It is required, because we have to re-evaluate object payload hash. Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
0d215e4
to
127dd3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffering is the only weakness left here, but we can try optimizing it in future, for now I'd like to see functional requirements met.
Closes #843