-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MM-62747] add NFS support #884
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interested to see the results of this. Have you run any tests yet?
@@ -210,17 +251,17 @@ resource "aws_instance" "proxy_server" { | |||
|
|||
resource "aws_iam_user" "s3user" { | |||
name = "${var.cluster_name}-s3user" | |||
count = var.app_instance_count > 1 && var.s3_external_bucket_name == "" ? 1 : 0 | |||
count = var.app_instance_count > 1 && var.s3_external_bucket_name == "" && !var.create_efs ? 1 : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh boy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy indeed 😂 Maybe we should make use of modules for this. Having an s3 module that can be automatically switched on/off. Something like https://stackoverflow.com/a/72867545/3248221. I haven't ever tried that, so if that's too difficult/cumbersome, we could at least refactor this whole var.app_instance_count > 1 && var.s3_external_bucket_name == "" && !var.create_efs ? 1 : 0
into a new create_s3_resources
variable. Thoughts?
@@ -17,6 +19,9 @@ do | |||
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \ | |||
sudo apt-get -y update && \ | |||
sudo apt-get install -y mysql-client-8.0 && \ | |||
sudo apt-get remove -y needrestart && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only way to disable interactivity during install. I guess it will be required even if we are going to host the compiled binary.
@@ -17,6 +19,9 @@ do | |||
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \ | |||
sudo apt-get -y update && \ | |||
sudo apt-get install -y mysql-client-8.0 && \ | |||
sudo apt-get remove -y needrestart && \ | |||
sudo apt-get install -y -o Dpkg::Options::="--force-confold" binutils git cargo libssl-dev make pkg-config && \ | |||
git clone https://github.com/aws/efs-utils && cd efs-utils && make deb && sudo sudo apt-get install -y -o Dpkg::Options::="--force-confold" ./build/amazon-efs-utils-*.deb && cd - && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just host the debian file somewhere that we can download and install? This is adding more complications to an already complicated and error-prone provisioning step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do so, maybe simply fork and host on github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we should try to make this a bit more reliable. At least using a specific tag of the repo. But I'm on board with the fork idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this change as well to avoid making the debian. Just download from somewhere and do a dpkg -i
deployment/terraform/create.go
Outdated
@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e | |||
return fmt.Errorf("error while setting up an agents: %w", err) | |||
} | |||
|
|||
if !t.output.HasAppServers() { | |||
if !t.output.HasAppServers() || !t.output.HasAgents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.initLoadTest
already has this condition:
if len(t.output.Agents) == 0 {
return errors.New("there are no agents to initialize load-test")
}
Do we still need this OR condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my addition was to avoid the error. I don't think having zero agents is a scenario we should fail. But I'm 0/5 as in the end this is a load test tool :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, would there be any use-case where you want to deploy app servers, but not agents?
But even if we did, then in that case, we can remove the if condition inside initLoadTest
.
But let's confirm whether an error is needed or not - WDYT @agarciamontoro ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been brought up before. I think it doesn't make a lot of sense right now, since we always want to run a load test, and for that we need agents. If we ended up splitting the tool in a deployer / loadtester pair, then that would make more sense. But I think we should always assume there's at least one agent. I'm happy to change my mind, but at least it should be in a separate PR, I feel it's a whole separate discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this @isacikgoz as per the above discussion.
deployment/terraform/create.go
Outdated
if t.output.HasProxy() && t.output.HasS3Key() && t.output.HasS3Bucket() { | ||
if t.output.HasEFS() { | ||
cfg.FileSettings.DriverName = model.NewPointer("local") | ||
cfg.FileSettings.Directory = model.NewPointer("/mnt/mattermost-data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the const efsDirectory
?
deployment/terraform/efs.go
Outdated
mlog.Info("Updating /etc/fstab to mount EFS", mlog.String("fsid", t.output.EFSAccessPoint.FileSystemId)) | ||
// setting uid and gid to ubuntu user | ||
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory) | ||
cmd = fmt.Sprintf("grep -q %q /etc/fstab || echo %q | sudo tee -a /etc/fstab > /dev/null", efsDirectory, fstabEntry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is too complicated to read. Why the "||" with echo
? Why are we doing tee
but then redirecting to /dev/null
? Why can't we do a >>
to append?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add a check whether if it is already mounted. If not, we simply skip. For tee
and /dev/null
it's just syntactic taste :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about cmd = fmt.Sprintf("grep -q %q /etc/fstab || sudo sh -c 'echo %q >> /etc/fstab'", efsDirectory, fstabEntry)
?
deployment/terraform/efs.go
Outdated
return fmt.Errorf("error mounting the fstab entry: %w", err) | ||
} | ||
|
||
cmd = fmt.Sprintf("sudo chown -R ubuntu:ubuntu %s", efsDirectory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of running the command, and then copying the same thing again in the svcScript, can we simply upload the script and then run the script as part of chmod
ding it - sudo chmod +x <file> && ./<file>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change mod on every reboot otherwise ownership is set to root
. I couldn't find a better workaround for that. That's why I had to create the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure. All I am saying is instead of running the chmod command separately, we can simply execute the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we need the script at all? Could we write ExecStart=chown -R ubuntu:ubuntu /mnt/mattermost-data
directly in the unit file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should do it in ExecStartPre
then.
data "aws_vpc" "selected" { | ||
tags = { | ||
Name = "Default VPC" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should get default if there is no vpc defined in the config.
deployment/terraform/create.go
Outdated
@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e | |||
return fmt.Errorf("error while setting up an agents: %w", err) | |||
} | |||
|
|||
if !t.output.HasAppServers() { | |||
if !t.output.HasAppServers() || !t.output.HasAgents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was resulting with error if there were no agents. I think we should be able to deploy without agents. 0/5
@@ -17,6 +19,9 @@ do | |||
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \ | |||
sudo apt-get -y update && \ | |||
sudo apt-get install -y mysql-client-8.0 && \ | |||
sudo apt-get remove -y needrestart && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only way to disable interactivity during install. I guess it will be required even if we are going to host the compiled binary.
@@ -17,6 +19,9 @@ do | |||
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \ | |||
sudo apt-get -y update && \ | |||
sudo apt-get install -y mysql-client-8.0 && \ | |||
sudo apt-get remove -y needrestart && \ | |||
sudo apt-get install -y -o Dpkg::Options::="--force-confold" binutils git cargo libssl-dev make pkg-config && \ | |||
git clone https://github.com/aws/efs-utils && cd efs-utils && make deb && sudo sudo apt-get install -y -o Dpkg::Options::="--force-confold" ./build/amazon-efs-utils-*.deb && cd - && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do so, maybe simply fork and host on github?
deployment/terraform/create.go
Outdated
@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e | |||
return fmt.Errorf("error while setting up an agents: %w", err) | |||
} | |||
|
|||
if !t.output.HasAppServers() { | |||
if !t.output.HasAppServers() || !t.output.HasAgents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my addition was to avoid the error. I don't think having zero agents is a scenario we should fail. But I'm 0/5 as in the end this is a load test tool :)
deployment/terraform/efs.go
Outdated
mlog.Info("Updating /etc/fstab to mount EFS", mlog.String("fsid", t.output.EFSAccessPoint.FileSystemId)) | ||
// setting uid and gid to ubuntu user | ||
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory) | ||
cmd = fmt.Sprintf("grep -q %q /etc/fstab || echo %q | sudo tee -a /etc/fstab > /dev/null", efsDirectory, fstabEntry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add a check whether if it is already mounted. If not, we simply skip. For tee
and /dev/null
it's just syntactic taste :p
deployment/terraform/efs.go
Outdated
return fmt.Errorf("error mounting the fstab entry: %w", err) | ||
} | ||
|
||
cmd = fmt.Sprintf("sudo chown -R ubuntu:ubuntu %s", efsDirectory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change mod on every reboot otherwise ownership is set to root
. I couldn't find a better workaround for that. That's why I had to create the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, @isacikgoz, thank you for all the work here!
I left a bunch of comments below, mostly nits, a few to discuss more in depth. The only other ask I have here is to add docs on the new config setting and update the config sample files (both json and toml) to add the setting there as well.
Thanks!
data "aws_vpc" "selected" { | ||
tags = { | ||
Name = "Default VPC" | ||
} | ||
} | ||
|
||
data "aws_subnets" "loadtest_subnets" { | ||
filter { | ||
name = "vpc-id" | ||
values = [data.aws_vpc.selected.id] | ||
} | ||
|
||
tags = { | ||
Name = "loadtest*" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make use of the ClusterSubnetIDs
here if they are defined? Also, we have the variable cluster_vpc_id
, which is configurable by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah we should, let me add that.
count = var.create_efs ? 1 : 0 | ||
|
||
tags = { | ||
Name = "Shared-fs-${var.cluster_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the convention for the Name
tag, this should be something like
Name = "Shared-fs-${var.cluster_name}" | |
Name = "${var.cluster_name}-shared-fs" |
resource "aws_efs_mount_target" "efs_mount" { | ||
for_each = (var.create_efs ? toset(data.aws_subnets.loadtest_subnets.ids) : toset({})) | ||
file_system_id = aws_efs_file_system.efs_shared.0.id | ||
subnet_id = each.value | ||
security_groups = [aws_security_group.app[0].id] | ||
} | ||
|
||
resource "aws_efs_access_point" "shared_dir" { | ||
count = var.create_efs ? 1 : 0 | ||
|
||
file_system_id = aws_efs_file_system.efs_shared.0.id | ||
|
||
root_directory { | ||
path = "/" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add Name
tags to these in the form of "${var.cluster_name}-whatever-whatever"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add it to aws_efs_access_point
but can't add it to aws_efs_mount_target
, I'll go ahead and add only to the access point.
resource "aws_efs_mount_target" "efs_mount" { | ||
for_each = (var.create_efs ? toset(data.aws_subnets.loadtest_subnets.ids) : toset({})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need for_each
here? Can't we use count = var.create_efs ? 1 : 0
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agarciamontoro We need to create mount target for each availability zone (by using the various subnets).
resource "aws_efs_mount_target" "efs_mount" { | ||
for_each = (var.create_efs ? toset(data.aws_subnets.loadtest_subnets.ids) : toset({})) | ||
file_system_id = aws_efs_file_system.efs_shared.0.id | ||
subnet_id = each.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should use a new entry in ClusterSubnetIDs
. There's many examples of that in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use the existing subnets, because this requires access from the other resources hence we create the targets accordingly.
deployment/terraform/efs.go
Outdated
[Service] | ||
Type=simple | ||
ExecStart=/bin/bash -c "/opt/mount.sh" | ||
|
||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit levels over 9000 (double new line):
[Service] | |
Type=simple | |
ExecStart=/bin/bash -c "/opt/mount.sh" | |
[Install] | |
WantedBy=multi-user.target | |
[Service] | |
Type=simple | |
ExecStart=/bin/bash -c "/opt/mount.sh" | |
[Install] | |
WantedBy=multi-user.target |
|
||
func (t *Terraform) setupEFS(extAgent *ssh.ExtAgent) error { | ||
for _, val := range t.output.Instances { | ||
sshc, err := extAgent.NewClient(val.PublicIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take #881 into account in case it's merged before this one. cc/@fmartingr
deployment/terraform/efs.go
Outdated
|
||
mlog.Info("Updating /etc/fstab to mount EFS", mlog.String("fsid", t.output.EFSAccessPoint.FileSystemId)) | ||
// setting uid and gid to ubuntu user | ||
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we make this a multiline? Something like
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory) | |
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", | |
t.output.EFSAccessPoint.FileSystemId, | |
t.config.AWSRegion, | |
efsDirectory) | |
deployment/terraform/efs.go
Outdated
return fmt.Errorf("error mounting the fstab entry: %w", err) | ||
} | ||
|
||
cmd = fmt.Sprintf("sudo chown -R ubuntu:ubuntu %s", efsDirectory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we need the script at all? Could we write ExecStart=chown -R ubuntu:ubuntu /mnt/mattermost-data
directly in the unit file?
rdr = strings.NewReader(efsService) | ||
if out, err := sshc.Upload(rdr, efcServiceDir, true); err != nil { | ||
return fmt.Errorf("error uploading file, dstPath: %s, output: %q: %w", efcServiceDir, out, err) | ||
} | ||
|
||
cmd = "sudo service efs-mount start" | ||
_, err = sshc.RunCommand(cmd) | ||
if err != nil { | ||
return fmt.Errorf("error starting the service for efs mount directory ownership: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work directly? Don't we need something like sudo systemctl daemon-reload
after uploading the file and before starting the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it works directly, at least it seemed to work 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 2 things to take care of. Looking good.
deployment/terraform/create.go
Outdated
@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e | |||
return fmt.Errorf("error while setting up an agents: %w", err) | |||
} | |||
|
|||
if !t.output.HasAppServers() { | |||
if !t.output.HasAppServers() || !t.output.HasAgents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this @isacikgoz as per the above discussion.
@@ -17,6 +19,9 @@ do | |||
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \ | |||
sudo apt-get -y update && \ | |||
sudo apt-get install -y mysql-client-8.0 && \ | |||
sudo apt-get remove -y needrestart && \ | |||
sudo apt-get install -y -o Dpkg::Options::="--force-confold" binutils git cargo libssl-dev make pkg-config && \ | |||
git clone https://github.com/aws/efs-utils && cd efs-utils && make deb && sudo sudo apt-get install -y -o Dpkg::Options::="--force-confold" ./build/amazon-efs-utils-*.deb && cd - && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this change as well to avoid making the debian. Just download from somewhere and do a dpkg -i
Summary
Adds EFS deployment capability
Ticket Link
https://mattermost.atlassian.net/browse/MM-62747