Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating a Knowledgebase #645

Open
joshmoore opened this issue Feb 5, 2025 · 7 comments
Open

Generating a Knowledgebase #645

joshmoore opened this issue Feb 5, 2025 · 7 comments

Comments

@joshmoore
Copy link
Member

At the AHM meeting, a KB was discussed which guided users to the basic concepts that they need for getting started with RDM. (What is OME-Zarr? How do I get started with OMERO? etc.)

Would it be possible to generate an initial outline for a KB using the material in search insight and/or the search engine?

@haesleinhuepf
Copy link
Member

Yes, a starting point might be resources/nfdi4bioimage.yml . We should summarize it into a 3-level hierarchy of important topics in markdown format. For each topic we select one relevant url from the yml file.

git-bob ask gemini-2.0-pro-exp-02-05 to answer this

Copy link
Contributor

github-actions bot commented Feb 5, 2025

This message was generated by git-bob (version: 0.17.0, model: gemini-2.0-pro-exp-02-05, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

Okay, here's a response to the github issue, focusing on generating the requested knowledge base outline:

As suggested by @haesleinhuepf, I will process resources/nfdi4bioimage.yml to generate a 3-level Markdown hierarchy. I will select representative URLs for each topic.

# NFDI4Bioimage Knowledge Base Outline

## Level 1: Core Concepts

### Research Data Management (RDM)
   *   **Level 2: RDM Basics**
        *   *Level 3: Getting Started with RDM*
            *   URL: [Forschungsdaten.info](https://forschungsdaten.info/)
   *   **Level 2: RDM Planning**
        *    *Level 3: Data Management Plans*
            *   URL: [Euro-BioImaging's Template for Research Data Management Plans](https://zenodo.org/records/11473803)
   *  **Level 2: FAIR Principles**
        * *Level 3: Introduction to FAIR*
            * URL: [FAIR Principles](https://www.go-fair.org/fair-principles/)
   *  **Level 2: Metadata**
        * *Level 3: Metadata Standards*
            *   URL: [REMBI: Recommended Metadata for Biological Images—enabling reuse of microscopy data in biology](https://www.nature.com/articles/s41592-021-01166-8)
    * **Level 2: Sharing and Licensing**
        * *Level 3: Sharing research Data*
            * URL: [Sharing research data with Zenodo](https://focalplane.biologists.com/2023/02/15/sharing-research-data-with-zenodo/)
### Bioimage Analysis
 *   **Level 2: Bioimage Analysis Basics**
        *   *Level 3: Introduction to Bioimage Analysis*
            *   URL: [What is Bioimage Analysis? An Introduction](https://www.dropbox.com/s/5abw3cvxrhpobg4/20220923_DefragmentationTS.pdf?dl=0)
 *   **Level 2: Common Tools**
     *   *Level 3: ImageJ/Fiji*
            * URL: [Introduction to Image Analysis with Fiji](https://github.com/mpicbg-scicomp/CourseIntroToIA)
     *    *Level 3: Python*
            *   URL: [Bio-image Analysis with Python](https://github.com/BiAPoL/Bio-image_Analysis_with_Python)
     *   *Level 3: Napari*
            *   URL: [Bioimage analysis with Napari](https://www.fabriziomusacchio.com/teaching/teaching_bioimage_analysis/)
     * *Level 3: CellProfiler*
           * URL: [CellProfiler tutorials](https://github.com/CellProfiler/tutorials)
 * **Level 2: Specific Tasks**
        *   *Level 3: Segmentation*
             * URL: [Quality assurance of segmentation results](https://focalplane.biologists.com/2023/04/13/quality-assurance-of-segmentation-results/)
        *  *Level 3: Tracking*
           *  URL: [Tracking in napari](https://focalplane.biologists.com/2023/06/01/tracking-in-napari/)
 *   **Level 2: Artificial Intelligence**
        *   *Level 3: Deep Learning Basics*
            *   URL: [Deep Learning Based Segmentation For Biologists](https://github.com/tpecot/DeepLearningBasedSegmentationForBiologists/)
        * *Level 3: Prompt Engineering*
            * URL: [Prompt Engineering in Bio-image Analysis](https://focalplane.biologists.com/2024/07/18/prompt-engineering-in-bio-image-analysis/)

### Data and Infrastructure
    * **Level 2: Data Repositories**
        * *Level 3: BioImage Archive*
           * URL: [Bioimage Archive](https://www.ebi.ac.uk/bioimage-archive/)
        * *Level 3: Image Data Resource (IDR)*
            * URL: [Image Data Resources](https://idr.openmicroscopy.org/)
    * **Level 2: Data Management Platforms**
        * *Level 3: OMERO*
           *  URL: [OME Documentation](https://www.openmicroscopy.org/docs/)
    * **Level 2: Workflow Engines**
        * *Level 3: Galaxy*
            * URL: [Galaxy Training Material](https://github.com/galaxyproject/training-material)
        * *Level 3: Nextflow*
              * URL: [NextFlow documentation](https://www.nextflow.io/)

@haesleinhuepf
Copy link
Member

We need this in more detail (more entries in level 2 and 3). Also answer in markdown without fences. Example:

Research Data Management (RDM)

RDM Basics

Getting Started with RDM

git-bob ask gemini-2.0-pro-exp-02-05 to answer this

Copy link
Contributor

github-actions bot commented Feb 5, 2025

This message was generated by git-bob (version: 0.17.0, model: gemini-2.0-pro-exp-02-05, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

Okay, I'll provide an updated version of the knowledge base outline, addressing the feedback from @haesleinhuepf. The goal is to expand the level 2 and 3 entries, drawing more comprehensively from the nfdi4bioimage.yml file, and format it as simple markdown.

Research Data Management (RDM)

RDM Basics

Getting Started with RDM
Data Management Plans
RDM Policies

FAIR Principles

Introduction to FAIR
FAIR in Bioimaging
Data Stewardship

Metadata

Metadata Standards
Metadata Annotation Tools

Sharing and Licensing

Sharing research Data
Citing research data

Bioimage Analysis

Bioimage Analysis Basics

Introduction to Bioimage Analysis
Bioimage Analysis Workflows

Common Tools

ImageJ/Fiji
Python
Napari
CellProfiler
QuPath

Specific Tasks

Segmentation
Tracking
Registration

Artificial Intelligence

Deep Learning Basics
Prompt Engineering

Data and Infrastructure

Data Repositories

BioImage Archive
Image Data Resource (IDR)
Other Repositories

Data Management Platforms

OMERO

Workflow Engines

Galaxy
Nextflow

This expanded outline provides a more detailed starting point for the NFDI4Bioimage knowledge base, categorized into three levels and linking to numerous specific resources from the provided YAML file. It presents a structured view of essential concepts and tools for RDM and bioimage analysis.

@haesleinhuepf
Copy link
Member

@joshmoore what do you think about the proposed ouline? Feel free to ask git-bob directly for modifications. You should be able to instruct it directly.

@SaibotMagd
Copy link

SaibotMagd commented Feb 6, 2025

I would like to add:

Metadata

Metadata Standards

Publishing research code FAIR - a roadmap
git-bob ask gemini-2.0-pro-exp-02-05 to answer this

@SaibotMagd
Copy link

SaibotMagd commented Feb 6, 2025

At the AHM meeting, a KB was discussed which guided users to the basic concepts that they need for getting started with RDM. (What is OME-Zarr? How do I get started with OMERO? etc.)

Would it be possible to generate an initial outline for a KB using the material in search insight and/or the search engine?

As I'm working on a institute-vise onboarding process. A start guide should be structured more like a pipeline that you can follow step by step. it should be as simple as possible. You can build on your existing knowledge later to delve deeper into the individual parts.

Some central questions I always have in mind is: “Who should look for which source, for what reason, when? What does this person want? How many resources are they willing to use?”
(possible Answers)

“Who:

  • everyone including staff workers

For what reason?

  • people are looking for help when they need some, so when they struggle; rarely beforehand

What does this person want?

  • a very short answer to a somewhat precise question (even if they are unable to formulate the question in the right way, or they don't even know what there're looking for)

How many resources are they willing to use?

  • not more than 1-3 people are searching for answers
  • it is rare that the answer to an urgent question is sought for more than 1-3 days. after that, either the answer is given up, if that is possible, or the easiest possible workaround is sought, or the problem is postponed until later

# That's why I designed a sample research process for us. The current design looks like this (very early version):

Enhanced Structure and Workflow for an Empirical Research Project

1. Application for Funding

2. Research of the Research Area

  • Description: Conduct a comprehensive literature review to understand the current state of research. Use Zotero to manage and organize your sources.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Define keywords and search terms.
    2. Search for relevant articles and studies.
    3. Import the found sources into Zotero.
    4. Save the literature research in the /Research folder.

3. Development of a Hypothesis

  • Description: Formulate a clear and testable hypothesis based on your literature review. Document the hypothesis and the underlying assumptions.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Analyze the collected sources.
    2. Identify research gaps.
    3. Formulate a hypothesis.
    4. Save the hypothesis in the /Hypothesis folder.

4. Operationalization of the Idea

  • Description: Document the operationalization of your hypothesis in an electronic lab notebook. Use Jupyter Notebooks to make the process transparent and traceable.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Define the variables and measurement methods.
    2. Create a Jupyter Notebook for documentation.
    3. Describe the experimental setup and execution.
    4. Use Zotero for storing sources and notes.
    5. Use Markdown in LIN-nextcloud or locally with Obsidian for documentation.
    6. Use open documents in LIN-nextcloud or locally with LibreOffice.
    7. Use rspace ELN for daily documentation (includes Collabora).
    8. Use pyrats for planning and documenting animal husbandry, with integration into rspace ELN.
    9. Save the notebook in the /Operationalization folder.

5. Implementation and Documentation

  • Description: Implement the operationalization and document the entire process in an electronic lab notebook. Use Jupyter Notebooks for documentation.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Conduct the experiments.
    2. Document each step in the Jupyter Notebook.
    3. Record all observations and results.
    4. Create project-specific documents in rspace ELN.
    5. Save the notebook in the /Implementation folder.

6. Data Analysis and Visualization

  • Description: Analyze the collected data and visualize the results using open-source tools such as Jupyter Hub, Python, R, or MATLAB.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Clean and prepare the data.
    2. Choose appropriate analysis methods.
    3. Perform the data analysis.
    4. Visualize the results.
    5. Ensure that the programs record the work steps seamlessly.
    6. Use source code from MATLAB, Python, R, or any other language supported by JupyterHub.
    7. Use the Macro Recorder in Fiji/ImageJ for manual tasks.
    8. Document manual tasks with OBS Studio and later extract significant frames to save space.
    9. Document the program and version used for proprietary formats.
    10. Include a link to the program in the documentation.
    11. Convert raw files to non-proprietary formats if programs have access protection.
    12. Save the notebook in the /DataAnalysis folder.

7. Creation of the Paper

8. Compiling Data According to BIDS Standard

  • Description: Structure the collected data according to the BIDS (Brain Imaging Data Structure) standard to ensure reusability and comparability.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Study the BIDS specification.
    2. Organize the data according to the BIDS standard.
    3. Check the structure for completeness.
    4. Save the dataset in the /BIDS folder.

9. Writing to Journals

  • Description: Identify suitable scientific journals and write cover letters to submit your paper.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Identify appropriate journals.
    2. Write a cover letter for each journal.
    3. Review the cover letters for completeness.
    4. Save the cover letters in the /Journals folder.

10. Uploading the Dataset to an Archive

  • Description: Upload the structured dataset to a data archive like Zenodo to ensure long-term availability.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Create an account on Zenodo.
    2. Upload the dataset.
    3. Fill in the metadata.
    4. Save the dataset in the /Archive folder.

11. Uploading the Source Code to GitHub

  • Description: Upload the source code of your analysis and documentation to GitHub to ensure transparency and reusability.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Create a repository on GitHub.
    2. Upload the source code.
    3. Document the code and its usage.
    4. Save the source code in the /GitHub folder.

12. Uploading the Paper to a Pre-Print Server

  • Description: Upload the paper to a pre-print server like arXiv to receive early feedback and increase visibility.
  • Tools:
  • Tutorials:
  • Workflow:
    1. Create an account on arXiv.
    2. Upload the paper.
    3. Fill in the metadata.
    4. Save the paper in the /PrePrint folder.

13. Linking the Dataset with the Source Code and the Paper

  • Description: Link the dataset, the source code, and the paper together to ensure comprehensive documentation and traceability.
  • Tools:
  • Workflow:
    1. Create links between the platforms.
    2. Document the links in the paper.
    3. Verify the links for functionality.

Back to Table of Contents

Recommended Folder Structure on the Hard Drive

/ResearchProject
    /Application
        - Application_DFG.pdf
        - Application_EU.pdf
    /Research
        - LiteratureResearch.zotero
    /Hypothesis
        - Hypothesis.pdf
    /Operationalization
        - Operationalization.ipynb
    /Implementation
        - Implementation.ipynb
    /DataAnalysis
        - DataAnalysis.ipynb
    /Paper
        - Paper.tex
        - Paper.md
        - Paper.odt
    /BIDS
        - Dataset_BIDS.zip
    /Journals
        - JournalSubmission.pdf
    /Archive
        - Dataset_Zenodo.zip
    /GitHub
        - SourceCode.zip
    /PrePrint
        - Paper_arXiv.pdf```
[Back to Table of Contents](#table-of-contents)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants