Skip to content

Commit

Permalink
docs: spelling and grammar in Architecture section (#594)
Browse files Browse the repository at this point in the history
  • Loading branch information
carlinmack authored Dec 8, 2023
1 parent 33c4019 commit 1444efc
Show file tree
Hide file tree
Showing 9 changed files with 85 additions and 88 deletions.
14 changes: 7 additions & 7 deletions docs/develop/architecture/communities.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ community on the other hand may only contain restricted records.
An InvenioRDM instance defines a set of community roles that apply globally to all
communities to ensure a consistent user experience across all communities.

A community roles translates into a set of permissions. The roles are
A community role translates into a set of permissions. The roles are
configurable to ensure that they can be tailored to the needs of an instance,
as well as allow for customizations. For example, one instance might want to use
"curator" and another instance wants to use "editor" as a role because it suits
their use cases better.
as well as allow for customizations. For example, one instance may use
"curator" and another may use "editor" as a role because it suits
their use case better.

By default the following community roles are defined:

Expand Down Expand Up @@ -97,8 +97,8 @@ A community should be removed if there is only one owner and it wants to leave.

**No self role change**

A member cannot change its own role. This prevents owners and managers from loosing
their access, as they'll have to ask another manager/owner to perform the
A member cannot change their own role. This prevents owners and managers from losing
their access as they'll have to ask another manager/owner to perform the
change.

**Visibility can be changed by members themselves**
Expand All @@ -107,7 +107,7 @@ For privacy reasons only members themselves can set their visibility to public.

**Visibility cannot be changed to public by managers/owners**

To allow owners/managers to manage how the community looks, they can decide
To allow owners/managers to manage how the community looks they can decide
to hide certain members from a community.

**Users can leave a community**
Expand Down
14 changes: 6 additions & 8 deletions docs/develop/architecture/event_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,13 @@ For every event there is one or more operations that need to be executed. In the
examples the logic is implemented in case-specific tasks that run periodically (celery
beat).

When processing events it important to _eventually_ arrive to the same state. For example,
if all events where to be reprocessed (e.g. _event sourcing_) several times the final
state should be the same. In addition, if one or more event were processed more than once
When processing events it is important to _eventually_ arrive to the same state. For example,
if all events where to be reprocessed (e.g. _event sourcing_) several times, the final
state should be the same. In addition, if one or more events were processed more than once
the final state should also be the same, this means that event processing must be
idempotent.

Moreover, processing events can be a long running task. If this time to process/persist
the result would affect the user experience _eager read derivation_ could be used. However, this comes at a cost of having to handle rollbacks and potential data inconsistency.
Moreover, processing events can take a long time. If this effects the user experience _eager read derivation_ could be used. However, this comes at a cost of having to handle rollbacks and potential data inconsistency.

## Events

Expand All @@ -59,7 +58,7 @@ information also need to be updated too (e.g. membership requests reflecting the
username).

To achieve this, the events or _change notifications_ are registered in the unit of work.
On an operation that will trigger an asynchronous celery task with a custom payload on
On an operation that will trigger an asynchronous Celery task with a custom payload on
`post_commit`. This task will enrich the payload and trigger all the handlers registered
for the record type that was changed. These handlers are service methods with custom
signatures, and are configurable in the `invenio.cfg` file. The records have a revision id
Expand All @@ -68,8 +67,7 @@ and use optimistic concurrency, therefore idempotence can be guaranteed.
### Statistics

Statistics such as record view or file downloads are calculated using events. These
events are emitted on the resources/views, by registering a message in a specific queue.
These messages have a custom payload. Then event indexers will read this messages and
events are emitted on the resources/views, by registering a message with a custom payload in a specific queue. Event indexers will then read these messages and
carry out the appropriate operations (e.g. aggregations). Event indexers are running as
celery beat tasks. To guarantee idempotence each event contains a unique identifier (e.g.
timestamp + User-agent + IP address + URL), which guarantees that the event is captured/persisted only once.
Expand Down
24 changes: 12 additions & 12 deletions docs/develop/architecture/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Preface

The following chapters describes the architecture of InvenioRDM and Invenio
Framework from high-level point of view. It's meant to describe and expose
the higher-level structure, and help guide you how code should be organized.
The following chapters describe the architecture of InvenioRDM and Invenio
Framework from a high-level point of view. It's meant to describe and expose
the higher-level structure, and help guide how code should be organized.

### How did we arrive to the current architecture?

Expand All @@ -14,7 +14,7 @@ extendable, adaptable, resilient, and *...insert your favorite other buzz words*

**Reality**

Reality is there's a lot of methodologies and patterns on how to build and architect
The reality is there's a lot of methodologies and patterns on how to build and architect
software systems. However, in practice, while methodologies are useful
it's often as much about tradeoffs and finding the right balance rather than
strict application of a specific methodology. Most of the time you have to deal
Expand All @@ -24,22 +24,22 @@ projects, prior history and practices.
**Evolving**

InvenioRDM is no different. The architecture is largely a byproduct of our past
experiences and challenges we've faced. The architecture as described here, is
experiences and challenges we've faced. The architecture as described here is
not meant to be final answer, but rather an evolving architecture that adapts
and improve over time. You also won't find the answer to all your question. As
and improve over time. You also won't find the answer to all your questions. As
we work with the architecture, we identify shortcomings, missing things and concepts
that could be better defined.

**Past experiences and challenges**

Some of the experiences and challenges we faced:

- **High developer turn-over and many juniors**: Because of the organisational framework and contract policies we've often had a high turn-over of developers and juniors. This means we must focus on good onboarding and great documentation to get developers to level where they can efficiently contribute and develop high quality software.
- **Spaghetti code**: We have had our share of "data massaging", type conversations and fluffy defined responsibilities which overall leads to a big ball of mud and interdependency hell.
- **High developer turn-over and many juniors**: Because of the organisational framework and contract policies we've often had a high turn-over of developers and juniors. This means we must focus on good onboarding and great documentation to get developers to a level where they can efficiently contribute and develop high quality software.
- **Spaghetti code**: We have had our share of "data massaging", type conversions and fluffy requirements which all leads to a big ball of mud and interdependency hell.
- **Bad design choices**: We've obviously sometimes made bad design choices and learned from our mistakes.
- **Recovering from failures**: We've had to recover from some spectacular database crashes, file loss incidents on big distributed storage clusters, that helps you understand which features was helpful and which features you wish you would have had.
- **Recovering from failures**: We've had to recover from some spectacular database crashes and file loss incidents on big distributed storage clusters that helped us understand which features were helpful and which features you wish you would've had.

By no means have we solved all of these, and any software project out there is likely facing.
By no means have we solved all of these, and any software project out there is likely facing the same issues.

### Why not X?

Expand All @@ -52,8 +52,8 @@ the people and organizations involved bring a long history.

**Microservices**

Microservices itself is not a substitute for an architecture, it's simply another way of tieing different systems together. Part of the complexity at the software level is moved to the infrastructure level, but independently of where components of the system reside they still have to communicate and have clear boundaries. You can do good and bad architectures with both monoliths and microservices. Google e.g. "microservices death star" for some examples.
Microservices themselves are not a substitute for an architecture, it's simply another way of tying different systems together. Part of the complexity at the software level is moved to the infrastructure level, but independently of where components of the system reside they still have to communicate and have clear boundaries. You can do good and bad architectures with both monoliths and microservices (e.g., Google "microservices death star" for some examples.)

**NoSQL**

SQL database have been around for the past 40 years and are often highly reliable systems. Most NoSQL system on the other hand have been around for much shorter periods and does not provide the same reliability. On top of that, InvenioRDM uses a hybrid approach of performing mainly writes of the primary data the relation database, but keeping a secondary copy indexed in NoSQL system for faster reads.
SQL database have been around for the past 40 years and are often highly reliable systems. Most NoSQL systems on the otherhand have only been around for a much shorter period and do not provide the same reliability. On top of that, InvenioRDM uses a hybrid approach of performing mainly writes of the primary data to the relational database, but keeping a secondary copy indexed in NoSQL system for faster reads.
26 changes: 13 additions & 13 deletions docs/develop/architecture/infrastructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,14 @@ traffic into three categories of requests:
- application requests: e.g. search queries
- record files requests: e.g. downloading very large files

This way you can dimension the number connection slots between different types
This way you can dimension the number of connection slots between different types
of requests according to available resources. For instance a static file
request can usually be served extremely efficiently, while an application request
usually takes longer and requires more memory.

Similar, downloading a very file depends on the client's available bandwidth
Similar, downloading a very large file depends on the client's available bandwidth
and can thus take up a connection slot for a significant amount time. If your
storage system supports it, it is possible with Invenio to completely offload
storage system supports it, Invenio allows you to completely offload
the serving of large files to your storage system (e.g. S3).

All in all, the primary job of the load balancer is to manage traffic to your
Expand All @@ -58,7 +58,7 @@ during major incidents.
## Web servers

The load balancer proxies traffic to one of several web servers. The web
server's primary job is to manage the connections into your application server.
server's primary job is to manage the connections to your application server.
A web server like Apache and Nginx is usually much better than an application
server to manage connections. Also, you can use the web server to configure
limits on specific parts of your application so that for instance you can
Expand All @@ -82,23 +82,23 @@ a binary format.

**Transactional databases**

The primary reason using an SQL database is that they provide transactions,
which is very important since data consistency for a repository is of utmost
The primary reason for using an SQL database is that they provide transactions
which are very important as data consistency is of utmost
importance. Also, database servers can handle very large amounts of data
as long as they are scaled and configured properly. Last but not least, they
are usually highly reliable as compared to some NoSQL solutions.

**Primary key lookups**

Most access from Invenio to the database is via primary key lookups, which
are usually very efficient in database. Search queries and the like are all
are usually very efficient in-database. Search queries and the like are all
sent to the search engine cluster which can provide much better performance
than a database.

## Search and indexing

Invenio uses OpenSearch as its underlying search engine since OpenSearch
is fully JSON-based, and thus fit well together with storing records internally
is fully JSON-based, and thus fits well together with storing records internally
in the database as JSON documents.

OpenSearch furthermore is highly scalable and provides very powerful search
Expand Down Expand Up @@ -129,24 +129,24 @@ it can offload some task to asynchronous jobs. It works by the application
sending a message to the message queue (e.g. RabbitMQ), which several Celery
worker nodes continuously consume tasks from.

An example of background tasks can for instance be sending an email or
For example a background task can be sending an email or
registering a DOI.

**Multiple queues**

The background processing supports multiple queues and advanced
workflows. You could for instance have a low priority queue that constantly
runs x number of file integrity checks per day, and another normal queue
runs X number of file integrity checks per day, and another normal queue
for other tasks like DOI registration.

**Cronjobs and retries**

Celery also supports running jobs at scheduled intervals as well as
retrying tasks in case the fail (e.g. if a remote service is temporarily down).
retrying tasks in case they fail (e.g. if a remote service is temporarily down).

## Caching and temporary storage
Invenio uses an in-memory cache like Redis or Memcache for fast temporary
storage. The cache is for instance used for:
storage. For example the cache is used for:

- User session storage
- Results from background jobs
Expand All @@ -164,5 +164,5 @@ access correctly on the external system.
**Multiple storage systems**

One strength of Invenio is that you can store files on multiple systems at the
same time. This is useful if you for instance need to use multiple systems or
same time. This is useful if you need to use multiple systems or
do live migration from one system to another.
2 changes: 1 addition & 1 deletion docs/develop/architecture/reading.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ of your architecture.
- [Patterns of Enterprise Application Architecture](https://martinfowler.com/eaaCatalog/)
- List of design patterns often found in enterprise systems.
- [Azure application architecture fundamentals](https://docs.microsoft.com/en-us/azure/architecture/guide/)
- Overview over different architectual styles.
- Overview of different architectual styles.
- [Clean architecture](https://www.amazon.com/Clean-Architecture-Craftsmans-Software-Structure/dp/0134494164)
- Overview of fundamental design principles.

4 changes: 2 additions & 2 deletions docs/develop/architecture/records.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ repositories.
## Data model

The record data model is used to describe **a resource**. Examples of resources
include e.g. journal articles, datasets, posters, videos, images, software and
include journal articles, datasets, posters, videos, images, software and
more. Some properties of resources:

- A resource may exist in one or more versions.
Expand Down Expand Up @@ -167,7 +167,7 @@ as help to curators of the community.
### Record versions

Record ownership is defined on the parent record and it's a parent record
that's member of a community, hence all versions of a record are owned by
that's a member of a community, hence all versions of a record are owned by
a community.

### Multiple communities
Expand Down
16 changes: 8 additions & 8 deletions docs/develop/architecture/requests.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ This guide is intended for maintainers and developers of InvenioRDM itself.

**Scope**

The guide provides a high-level architectural overview of the requests module InvenioRDM.
The guide provides a high-level architectural overview of the requests module of InvenioRDM.
Requests are considered part of the service layer in the [software architecture](software.md).

## Purpose

Requests is a generic InvenioRDM feature to support handling and automation of
Requests is a generic InvenioRDM feature to support the handling and automation of
requests between entities such as users, communities, administrators and/or the
system.

Expand All @@ -20,7 +20,7 @@ administrative tasks as possible and thereby reduce human resources required
for handling them.

A repository often has to deal with many types of requests. Examples of these
could include e.g.:
could include:

- Approval of new submission
- File replacement
Expand All @@ -38,13 +38,13 @@ streamlines and centralizes the UX for how users deal with requests.

## Overview

With the above examples of request in mind we can think of requests as:
With the above examples in mind we can think of requests as something which:

- Requests are created by someone who can cancel the request.
- Requests are received by someone who can accept or decline the request.
- Requests may require clarifications (i.e. a conversation) between the one
- are created by someone who can cancel it.
- are received by someone who can accept or decline it.
- may require clarifications (i.e. a conversation) between the one
creating the request and the one(s) accepting/declining it.
- Requests may expire after a certain amount of time.
- may expire after a certain amount of time.

## Entities

Expand Down
9 changes: 4 additions & 5 deletions docs/develop/architecture/runtime.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Runtime architecture

Invenio is at the core an application built on-top of the Flask web
At its core Invenio is an application built on-top of the Flask web
development framework, and fully understanding Invenio's architectural design
requires you to understand core concepts from Flask which will briefly be
covered here.
requires you to understand core concepts of Flask which will be covered in brief here.

The Flask application is exposed via different *application interfaces*
depending on if the application is running in a web server, CLI or job queue.
Expand Down Expand Up @@ -88,7 +87,7 @@ make this thread safe.

## Interfaces: WSGI, CLI and Celery

Overall the Flask application is running via three different application
Overall the Flask application runs via three different application
interfaces:

- **WSGI:** The frontend web server interfaces with Flask via Flask's WSGI
Expand Down Expand Up @@ -150,7 +149,7 @@ The Invenio application factory assembles your application in five phases:
variables for which you don't want to use the default values, e.g. the
database host configuration.
3. **URL converter loading**: In this phase, the application will load any of
your URL converts. This phase is usually only needed for some few specific
your URL converters. This phase is usually only needed for some few specific
cases.
4. **Flask extensions loading**: In this phase all the Invenio modules which
provide Flask extensions will initialize the extension. Usually the
Expand Down
Loading

0 comments on commit 1444efc

Please sign in to comment.