Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove re-slicing for multipart #931

Merged
merged 7 commits into from
Mar 26, 2024

Conversation

smallhive
Copy link
Contributor

@smallhive smallhive commented Feb 22, 2024

Closes #843

Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 1.91388% with 615 lines in your changes are missing coverage. Please review.

Project coverage is 26.92%. Comparing base (aadef93) to head (127dd3e).
Report is 1 commits behind head on master.

Files Patch % Lines
api/layer/multipart_upload.go 0.00% 265 Missing ⚠️
api/layer/neofs_mock.go 8.54% 102 Missing and 5 partials ⚠️
internal/neofs/tree.go 0.00% 86 Missing ⚠️
api/layer/tree_mock.go 0.00% 55 Missing ⚠️
api/layer/object.go 0.00% 49 Missing and 1 partial ⚠️
internal/neofs/neofs.go 0.00% 47 Missing ⚠️
api/layer/layer.go 33.33% 3 Missing and 1 partial ⚠️
api/handler/acl.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #931      +/-   ##
==========================================
- Coverage   27.97%   26.92%   -1.06%     
==========================================
  Files          87       87              
  Lines       13590    14129     +539     
==========================================
+ Hits         3802     3804       +2     
- Misses       9371     9903     +532     
- Partials      417      422       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch 2 times, most recently from 5814442 to 431c312 Compare February 28, 2024 11:32
@smallhive smallhive marked this pull request as ready for review February 28, 2024 11:32
@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch from 431c312 to 68451d2 Compare February 28, 2024 11:34
@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch from 68451d2 to e0d1565 Compare February 29, 2024 10:57
@smallhive
Copy link
Contributor Author

Rebased

Copy link
Member

@carpawell carpawell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the internal slicer option ever be turned off?

internal/neofs/tree.go Show resolved Hide resolved
api/layer/multipart_upload.go Outdated Show resolved Hide resolved
api/layer/multipart_upload.go Outdated Show resolved Hide resolved
id oid.ID
elements []string
creationTime = TimeNow(ctx)
// User may upload part large maxObjectSize in NeoFS. From users point of view it is a single object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i can understand this comment properly. I've read something about ETag (the only thing this var is used for) but cannot correlate it with this comment. can you please extend it or tell me what a reader should understand here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts can be reuploaded and they're identified by etags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just not get how this comments helps and what it wants to say to me

@@ -266,13 +268,20 @@ func (f MsgHandlerFunc) HandleMessage(ctx context.Context, msg *nats.Msg) error
// NewLayer creates an instance of a layer. It checks credentials
// and establishes gRPC connection with the node.
func NewLayer(log *zap.Logger, neoFS NeoFS, config *Config) Client {
buffers := sync.Pool{}
buffers.New = func() any {
b := make([]byte, neoFS.MaxObjectSize())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, a potential OOM killer is here, the same way we already fixed this in SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links to changes, please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sync.Pool showed better memory using profile after load tests. It Already works here

data := x.buffers.Get()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links to changes, please

nspcc-dev/neofs-sdk-go#539

Using sync.Pool showed better memory using profile after load tests. It Already works here

yes, but putting 1K objects 1-byte sized simultaneously leads to 64GB memory usage but requires only 1KB in fact. we already know we may be quite successive at running out of 1TB memory, not a cool story

but ok, as i understand we will return to this later

api/layer/multipart_upload.go Outdated Show resolved Hide resolved
api/layer/multipart_upload.go Show resolved Hide resolved
api/layer/multipart_upload.go Outdated Show resolved Hide resolved
internal/neofs/neofs.go Outdated Show resolved Hide resolved
cmd/s3-gw/app_settings.go Outdated Show resolved Hide resolved
Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we're to reupload a part?

api/layer/multipart_upload.go Outdated Show resolved Hide resolved
id oid.ID
elements []string
creationTime = TimeNow(ctx)
// User may upload part large maxObjectSize in NeoFS. From users point of view it is a single object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts can be reuploaded and they're identified by etags.

api/layer/multipart_upload.go Outdated Show resolved Hide resolved
api/layer/multipart_upload.go Show resolved Hide resolved
}

dt := n.buffers.Get()
chunk := dt.(*[]byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leverage p.Size here for buffer allocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just in place buffer like b := make([]byte, p.size)? Buffers already works here

data := x.buffers.Get()

and after many tests this approach shows better memory using, rather in place buffer

prm := PrmObjectCreate{
Container: bktInfo.CID,
Creator: bktInfo.Owner,
Attributes: attributes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an attribute identifying particular part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the situation when we need it. Do you have a case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a suggestion, if it can't simplify anything now then OK.

if len(lastPart.MultipartHash) > 0 {
splitPreviousID = lastPart.OID

if binaryUnmarshaler, ok := multipartHash.(encoding.BinaryUnmarshaler); ok && len(lastPart.MultipartHash) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to some helper, duplicating.

api/layer/neofs_mock.go Outdated Show resolved Hide resolved
cmd/s3-gw/app_settings.go Outdated Show resolved Hide resolved
api/data/tree.go Show resolved Hide resolved
api/layer/neofs.go Outdated Show resolved Hide resolved
api/layer/multipart_upload.go Show resolved Hide resolved
api/layer/multipart_upload.go Outdated Show resolved Hide resolved
api/layer/multipart_upload.go Show resolved Hide resolved
internal/neofs/neofs.go Outdated Show resolved Hide resolved
internal/neofs/neofs.go Outdated Show resolved Hide resolved
internal/neofs/neofs.go Outdated Show resolved Hide resolved
internal/neofs/tree.go Outdated Show resolved Hide resolved
internal/neofs/neofs.go Outdated Show resolved Hide resolved
@cthulhu-rider
Copy link
Contributor

cthulhu-rider commented Mar 7, 2024

main commit with reslicing removal contains only issue ref which is good for GH but not enough for git history overall. Lets write some text about motivation and used approach

Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch 3 times, most recently from 99144d1 to 55ab89a Compare March 15, 2024 05:03
@smallhive
Copy link
Contributor Author

What happens if we're to reupload a part?

Everything will be broken. We collect full object hash moving from part to part. If you re-upload any part in the middle, we will have to re-read all parts after it, recalculate hashes, and re-upload parts

@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch 2 times, most recently from c7a2e4e to 4188fe1 Compare March 15, 2024 07:29
@roman-khimov
Copy link
Member

Everything will be broken. We collect full object hash moving from part to part. If you re-upload any part in the middle, we will have to re-read all parts after it, recalculate hashes, and re-upload parts

Cool. We need to do this. And that's where per-part IDs can become handy because

Part numbers can be any number from 1 to 10,000, inclusive. A part number uniquely identifies a part
and also defines its position within the object being created. If you upload a new part using the same
part number that was used with a previous part, the previously uploaded part is overwritten.

And btw,

Each part must be at least 5 MB in size, except the last part.

@smallhive
Copy link
Contributor Author

Everything will be broken. We collect full object hash moving from part to part. If you re-upload any part in the middle, we will have to re-read all parts after it, recalculate hashes, and re-upload parts

Cool. We need to do this. And that's where per-part IDs can become handy because

Part numbers can be any number from 1 to 10,000, inclusive. A part number uniquely identifies a part
and also defines its position within the object being created. If you upload a new part using the same
part number that was used with a previous part, the previously uploaded part is overwritten.

And btw,

Each part must be at least 5 MB in size, except the last part.

Added logic for re-upload parts

@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch from 477e53a to 0d215e4 Compare March 19, 2024 06:41
Closes #843.

Before, multipartComplete read all parts of Big object to the memory, combine them and generate final Big object.
These step consume time and memory, eventually any system will fail to load all parts in mem or timeout during
the process.
After, object slicing process works from the first uploaded part. Calculating each part hash and whole object
hash during whole process. Storing object hash state to each part metadata in tree service.

Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
It is required, because we have to re-evaluate object payload hash.

Signed-off-by: Evgenii Baidakov <evgenii@nspcc.io>
@smallhive smallhive force-pushed the 843-complete-multipart-upload-fails-with-timeout branch from 0d215e4 to 127dd3e Compare March 19, 2024 06:47
Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffering is the only weakness left here, but we can try optimizing it in future, for now I'd like to see functional requirements met.

@roman-khimov roman-khimov merged commit 20f55c4 into master Mar 26, 2024
11 of 15 checks passed
@roman-khimov roman-khimov deleted the 843-complete-multipart-upload-fails-with-timeout branch March 26, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete multipart upload fails with timeout
4 participants