Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-62747] add NFS support #884

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

[MM-62747] add NFS support #884

wants to merge 5 commits into from

Conversation

isacikgoz
Copy link
Member

Summary

Adds EFS deployment capability

Ticket Link

https://mattermost.atlassian.net/browse/MM-62747

@isacikgoz isacikgoz added the 2: Dev Review Requires review by a core committer label Jan 28, 2025
Copy link
Member

@agnivade agnivade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interested to see the results of this. Have you run any tests yet?

@@ -210,17 +251,17 @@ resource "aws_instance" "proxy_server" {

resource "aws_iam_user" "s3user" {
name = "${var.cluster_name}-s3user"
count = var.app_instance_count > 1 && var.s3_external_bucket_name == "" ? 1 : 0
count = var.app_instance_count > 1 && var.s3_external_bucket_name == "" && !var.create_efs ? 1 : 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh boy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy indeed 😂 Maybe we should make use of modules for this. Having an s3 module that can be automatically switched on/off. Something like https://stackoverflow.com/a/72867545/3248221. I haven't ever tried that, so if that's too difficult/cumbersome, we could at least refactor this whole var.app_instance_count > 1 && var.s3_external_bucket_name == "" && !var.create_efs ? 1 : 0 into a new create_s3_resources variable. Thoughts?

@@ -17,6 +19,9 @@ do
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
sudo apt-get -y update && \
sudo apt-get install -y mysql-client-8.0 && \
sudo apt-get remove -y needrestart && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only way to disable interactivity during install. I guess it will be required even if we are going to host the compiled binary.

@@ -17,6 +19,9 @@ do
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
sudo apt-get -y update && \
sudo apt-get install -y mysql-client-8.0 && \
sudo apt-get remove -y needrestart && \
sudo apt-get install -y -o Dpkg::Options::="--force-confold" binutils git cargo libssl-dev make pkg-config && \
git clone https://github.com/aws/efs-utils && cd efs-utils && make deb && sudo sudo apt-get install -y -o Dpkg::Options::="--force-confold" ./build/amazon-efs-utils-*.deb && cd - && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just host the debian file somewhere that we can download and install? This is adding more complications to an already complicated and error-prone provisioning step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do so, maybe simply fork and host on github?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should try to make this a bit more reliable. At least using a specific tag of the repo. But I'm on board with the fork idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this change as well to avoid making the debian. Just download from somewhere and do a dpkg -i

@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e
return fmt.Errorf("error while setting up an agents: %w", err)
}

if !t.output.HasAppServers() {
if !t.output.HasAppServers() || !t.output.HasAgents() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.initLoadTest already has this condition:

if len(t.output.Agents) == 0 {
	return errors.New("there are no agents to initialize load-test")
}

Do we still need this OR condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my addition was to avoid the error. I don't think having zero agents is a scenario we should fail. But I'm 0/5 as in the end this is a load test tool :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would there be any use-case where you want to deploy app servers, but not agents?

But even if we did, then in that case, we can remove the if condition inside initLoadTest.

But let's confirm whether an error is needed or not - WDYT @agarciamontoro ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been brought up before. I think it doesn't make a lot of sense right now, since we always want to run a load test, and for that we need agents. If we ended up splitting the tool in a deployer / loadtester pair, then that would make more sense. But I think we should always assume there's at least one agent. I'm happy to change my mind, but at least it should be in a separate PR, I feel it's a whole separate discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this @isacikgoz as per the above discussion.

if t.output.HasProxy() && t.output.HasS3Key() && t.output.HasS3Bucket() {
if t.output.HasEFS() {
cfg.FileSettings.DriverName = model.NewPointer("local")
cfg.FileSettings.Directory = model.NewPointer("/mnt/mattermost-data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the const efsDirectory?

mlog.Info("Updating /etc/fstab to mount EFS", mlog.String("fsid", t.output.EFSAccessPoint.FileSystemId))
// setting uid and gid to ubuntu user
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory)
cmd = fmt.Sprintf("grep -q %q /etc/fstab || echo %q | sudo tee -a /etc/fstab > /dev/null", efsDirectory, fstabEntry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is too complicated to read. Why the "||" with echo? Why are we doing tee but then redirecting to /dev/null? Why can't we do a >> to append?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add a check whether if it is already mounted. If not, we simply skip. For tee and /dev/null it's just syntactic taste :p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cmd = fmt.Sprintf("grep -q %q /etc/fstab || sudo sh -c 'echo %q >> /etc/fstab'", efsDirectory, fstabEntry)?

return fmt.Errorf("error mounting the fstab entry: %w", err)
}

cmd = fmt.Sprintf("sudo chown -R ubuntu:ubuntu %s", efsDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of running the command, and then copying the same thing again in the svcScript, can we simply upload the script and then run the script as part of chmodding it - sudo chmod +x <file> && ./<file>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change mod on every reboot otherwise ownership is set to root. I couldn't find a better workaround for that. That's why I had to create the service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure. All I am saying is instead of running the chmod command separately, we can simply execute the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need the script at all? Could we write ExecStart=chown -R ubuntu:ubuntu /mnt/mattermost-data directly in the unit file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice one!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should do it in ExecStartPre then.

Comment on lines +85 to +89
data "aws_vpc" "selected" {
tags = {
Name = "Default VPC"
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should get default if there is no vpc defined in the config.

@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e
return fmt.Errorf("error while setting up an agents: %w", err)
}

if !t.output.HasAppServers() {
if !t.output.HasAppServers() || !t.output.HasAgents() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was resulting with error if there were no agents. I think we should be able to deploy without agents. 0/5

@@ -17,6 +19,9 @@ do
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
sudo apt-get -y update && \
sudo apt-get install -y mysql-client-8.0 && \
sudo apt-get remove -y needrestart && \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only way to disable interactivity during install. I guess it will be required even if we are going to host the compiled binary.

@@ -17,6 +19,9 @@ do
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
sudo apt-get -y update && \
sudo apt-get install -y mysql-client-8.0 && \
sudo apt-get remove -y needrestart && \
sudo apt-get install -y -o Dpkg::Options::="--force-confold" binutils git cargo libssl-dev make pkg-config && \
git clone https://github.com/aws/efs-utils && cd efs-utils && make deb && sudo sudo apt-get install -y -o Dpkg::Options::="--force-confold" ./build/amazon-efs-utils-*.deb && cd - && \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do so, maybe simply fork and host on github?

@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e
return fmt.Errorf("error while setting up an agents: %w", err)
}

if !t.output.HasAppServers() {
if !t.output.HasAppServers() || !t.output.HasAgents() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my addition was to avoid the error. I don't think having zero agents is a scenario we should fail. But I'm 0/5 as in the end this is a load test tool :)

mlog.Info("Updating /etc/fstab to mount EFS", mlog.String("fsid", t.output.EFSAccessPoint.FileSystemId))
// setting uid and gid to ubuntu user
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory)
cmd = fmt.Sprintf("grep -q %q /etc/fstab || echo %q | sudo tee -a /etc/fstab > /dev/null", efsDirectory, fstabEntry)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add a check whether if it is already mounted. If not, we simply skip. For tee and /dev/null it's just syntactic taste :p

return fmt.Errorf("error mounting the fstab entry: %w", err)
}

cmd = fmt.Sprintf("sudo chown -R ubuntu:ubuntu %s", efsDirectory)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change mod on every reboot otherwise ownership is set to root. I couldn't find a better workaround for that. That's why I had to create the service.

Copy link
Member

@agarciamontoro agarciamontoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, @isacikgoz, thank you for all the work here!

I left a bunch of comments below, mostly nits, a few to discuss more in depth. The only other ask I have here is to add docs on the new config setting and update the config sample files (both json and toml) to add the setting there as well.

Thanks!

Comment on lines +85 to +100
data "aws_vpc" "selected" {
tags = {
Name = "Default VPC"
}
}

data "aws_subnets" "loadtest_subnets" {
filter {
name = "vpc-id"
values = [data.aws_vpc.selected.id]
}

tags = {
Name = "loadtest*"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make use of the ClusterSubnetIDs here if they are defined? Also, we have the variable cluster_vpc_id, which is configurable by the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah we should, let me add that.

count = var.create_efs ? 1 : 0

tags = {
Name = "Shared-fs-${var.cluster_name}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the convention for the Name tag, this should be something like

Suggested change
Name = "Shared-fs-${var.cluster_name}"
Name = "${var.cluster_name}-shared-fs"

Comment on lines 110 to 125
resource "aws_efs_mount_target" "efs_mount" {
for_each = (var.create_efs ? toset(data.aws_subnets.loadtest_subnets.ids) : toset({}))
file_system_id = aws_efs_file_system.efs_shared.0.id
subnet_id = each.value
security_groups = [aws_security_group.app[0].id]
}

resource "aws_efs_access_point" "shared_dir" {
count = var.create_efs ? 1 : 0

file_system_id = aws_efs_file_system.efs_shared.0.id

root_directory {
path = "/"
}
}
Copy link
Member

@agarciamontoro agarciamontoro Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add Name tags to these in the form of "${var.cluster_name}-whatever-whatever"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add it to aws_efs_access_point but can't add it to aws_efs_mount_target, I'll go ahead and add only to the access point.

Comment on lines +110 to +111
resource "aws_efs_mount_target" "efs_mount" {
for_each = (var.create_efs ? toset(data.aws_subnets.loadtest_subnets.ids) : toset({}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need for_each here? Can't we use count = var.create_efs ? 1 : 0 as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agarciamontoro We need to create mount target for each availability zone (by using the various subnets).

resource "aws_efs_mount_target" "efs_mount" {
for_each = (var.create_efs ? toset(data.aws_subnets.loadtest_subnets.ids) : toset({}))
file_system_id = aws_efs_file_system.efs_shared.0.id
subnet_id = each.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should use a new entry in ClusterSubnetIDs. There's many examples of that in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use the existing subnets, because this requires access from the other resources hence we create the targets accordingly.

Comment on lines 19 to 25
[Service]
Type=simple
ExecStart=/bin/bash -c "/opt/mount.sh"


[Install]
WantedBy=multi-user.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit levels over 9000 (double new line):

Suggested change
[Service]
Type=simple
ExecStart=/bin/bash -c "/opt/mount.sh"
[Install]
WantedBy=multi-user.target
[Service]
Type=simple
ExecStart=/bin/bash -c "/opt/mount.sh"
[Install]
WantedBy=multi-user.target


func (t *Terraform) setupEFS(extAgent *ssh.ExtAgent) error {
for _, val := range t.output.Instances {
sshc, err := extAgent.NewClient(val.PublicIP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take #881 into account in case it's merged before this one. cc/@fmartingr


mlog.Info("Updating /etc/fstab to mount EFS", mlog.String("fsid", t.output.EFSAccessPoint.FileSystemId))
// setting uid and gid to ubuntu user
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we make this a multiline? Something like

Suggested change
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0", t.output.EFSAccessPoint.FileSystemId, t.config.AWSRegion, efsDirectory)
fstabEntry := fmt.Sprintf("%s.efs.%s.amazonaws.com:/ %s nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0",
t.output.EFSAccessPoint.FileSystemId,
t.config.AWSRegion,
efsDirectory)

return fmt.Errorf("error mounting the fstab entry: %w", err)
}

cmd = fmt.Sprintf("sudo chown -R ubuntu:ubuntu %s", efsDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need the script at all? Could we write ExecStart=chown -R ubuntu:ubuntu /mnt/mattermost-data directly in the unit file?

Comment on lines +74 to +83
rdr = strings.NewReader(efsService)
if out, err := sshc.Upload(rdr, efcServiceDir, true); err != nil {
return fmt.Errorf("error uploading file, dstPath: %s, output: %q: %w", efcServiceDir, out, err)
}

cmd = "sudo service efs-mount start"
_, err = sshc.RunCommand(cmd)
if err != nil {
return fmt.Errorf("error starting the service for efs mount directory ownership: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work directly? Don't we need something like sudo systemctl daemon-reload after uploading the file and before starting the service?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works directly, at least it seemed to work 😅

Copy link
Member

@agnivade agnivade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 things to take care of. Looking good.

@@ -508,7 +514,7 @@ func (t *Terraform) setupLoadtestAgents(extAgent *ssh.ExtAgent, initData bool) e
return fmt.Errorf("error while setting up an agents: %w", err)
}

if !t.output.HasAppServers() {
if !t.output.HasAppServers() || !t.output.HasAgents() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this @isacikgoz as per the above discussion.

@@ -17,6 +19,9 @@ do
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
sudo apt-get -y update && \
sudo apt-get install -y mysql-client-8.0 && \
sudo apt-get remove -y needrestart && \
sudo apt-get install -y -o Dpkg::Options::="--force-confold" binutils git cargo libssl-dev make pkg-config && \
git clone https://github.com/aws/efs-utils && cd efs-utils && make deb && sudo sudo apt-get install -y -o Dpkg::Options::="--force-confold" ./build/amazon-efs-utils-*.deb && cd - && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this change as well to avoid making the debian. Just download from somewhere and do a dpkg -i

@isacikgoz isacikgoz requested a review from agnivade February 20, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2: Dev Review Requires review by a core committer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants