Skip to content

Commit

Permalink
Merge pull request #12 from yuriok/dev0.3
Browse files Browse the repository at this point in the history
update ui texts, translations and tutorials
  • Loading branch information
Yuming Liu authored Jan 11, 2020
2 parents c688657 + f1d4835 commit 9ea2884
Show file tree
Hide file tree
Showing 27 changed files with 1,898 additions and 210 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,3 @@ venv.bak/

# logs
/logs
/i18n
Binary file modified docs/figures/app_appearance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/app_appearance_with_data_loaded.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ If you have any idea, you can contact the authors below.

* [How to load grain size distribution data](./tutorials/load_data)

* [The algorithms of QGrain](./tutorials/algorithm)

* [How to fit the samples](./tutorials/fit)

## Authors

* Yuming Liu
Expand Down
74 changes: 74 additions & 0 deletions docs/tutorials/algorithm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# The algorithms of QGrain

## Background

Grain size distribution (GSD) data have been widely used in Earth sciences, especially Quaternary Geology, due to its convenience and reliability. However, the usages of GSD are still oversimplified. The geological information contained in GSD is very abundant, but only some simplified proxies (e.g. mean grain size) are widely used. The most important reason is that GSD data are hard to interpret and visualize directly.

To overcome this, some researchers have developed the methods to unmix the mixed multi-modal GSD to some components to make the interpretation and visualization easier. These methods can be divided into two routes. One is end-member analysis (EMA) (Weltje, 1997) which takes a batch of samples for the calculation of the end-members. Another is called single-specimen unmixing (SSU) (Sun et al., 2002) which treats each sample as an individual.

The key difference between the two routes is that whether the end-members of a batch of samples are consistent. EMA believes that the end-members between different samples are consistent, the variations of GSD are only caused by the changing of fractions of the end-members. On the contrary, SSU has no assumption on the end-members, i.e. it admits that the end-members may vary between different samples.

Some mature tools (Paterson and Heslop, 2015; Dietze and Dietze, 2019) taking the EMA route have appeared, but there is no available public and easy-to-use tool for SSU. That the reason of creating QGrain.

## Fundamental

The math principle of SSU has been described by Sun et al. (2002).

In short, the distribution of a n-components mixed sample can be indicated as:

y = f<sub>1</sub> * *d*<sub>1</sub>(x) + ... + f<sub>n</sub> * *d*<sub>n</sub>(x),

where y is the mixed distribution, f<sub>i</sub> is the fraction of component i, *d*<sub>i</sub> is the base distribution function (e.g. Normal and Weibull) of component i, x is the classes of grain size.

The question is to get the distribution paramters of *d*<sub>i</sub>.

Therefore, the unmixing problem can be coverted to an optimization problem:

minimize the error (e.g. sum of squared error) between y<sub>test</sub> and y<sub>guess</sub>.

## Data preprocess

In fact the input data of each sample are two array. One is the classes of grain size, another is the distribution. Usually, there are many 0 values in the head and tail of distribution array. These 0 values were caused by the limit of test precision. In fact, they should be close to 0 but not equal to 0. This difference will bring a constant error which is large enough to effect the fitting result. QGrain will exclude these 0 values to obtain better performance.

## Local optimization

Due to the complexity of base distribution function, the error function is non-convex. At present, there is no high-efficiency method to find the global minimum of a non-convex function. So, an alternative solution is local optimization. Local optimization can converge to a minimum rapidly, but without guarantee that the minimum is global. Optimization problem also is a core topic of machine learning. Therefore, there are many mature local optimization algorithms that meet our requirement. Here we use Sequential Least SQuares Programming (SLSQP) (Kraft, 1988) algorithm to perform local optimization.

## Global optimization

With the increase of component number, the error function will become much more complex. It's difficult to get a satisfactory result if only use local optimization.

QGrain uses a global optimization algorithm called basinhopping (Wales & Doye, 1997) to improve the robustness.

This global optimization algorithm will not search the whole space but will shift to another initial point to start a new local optimization process after one local optimization process finished. That makes it has ability to escape some loacl minimum and keep the efficiency meanwhile.

## Base distribution function

At present, QGrain supports the following distribution types:

|Distribution Type|Parameter Number|Fitting Space|Skew|
|:-:|:-:|:-:|:-:|
|Normal<sup>1<sup>|2|Bin Numbers|No|
|Weibull|2|Bin Numbers|Yes|
|Gen. Weibull<sup>2</sup>|3|Bin Numbers|Yes|

1. Normal distribution againsts bin numbers is equal to Lognormal distribution againsts grain size (μm).
2. **Gen. Weibull** is General Weibull which has an additional location parameter.

## Steps of fitting

1. Data Loading
2. Get information (e.g. distribution type and component number)
3. Generate error function
4. Data preprocess
5. Global optimization (basinhopping)
6. Final optimization (another local optimization, SLSQP)
7. Generate fitting result by the parameters of error function

## Referances

* [Weltje, G.J. End-member modeling of compositional data: Numerical-statistical algorithms for solving the explicit mixing problem. Math Geol 29, 503–549 (1997) doi:10.1007/BF02775085](https://doi.org/10.1007/BF02775085)

* Kraft, D. A software package for sequential quadratic programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace Center – Institute for Flight Mechanics, Koln, Germany.

* Wales, D J, and Doye J P K, Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms. Journal of Physical Chemistry A, 1997, 101, 5111.
54 changes: 54 additions & 0 deletions docs/tutorials/fit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# How to fit the samples

By this, please make sure you have loaded the grain size distribution (GSD) data correctly.

If everything goes well, you will see the interface like this. By default, QGrain has fitted the first sample automatically.

![App Appearance With Data Loaded](../figures/app_appearance_with_data_loaded.png)

## The layout of app

QGrain consists of some docks which are movable, scalable, floatable, and closable. You can adjust them as you please. If you want to display a dock that has been closed before, you can click the **Docks** menu to realize it.

### Docks

* Cavas: The dock to display the raw data and fitting result of the sample you selected.
* Control Panel: The dock to control the fitting behaviours.
* Raw Data Table: The dock to show the GSD data of samples.
* Recorded Data Table: The dock to show the recorded fitting results.

## Tips

If you are confused to some widgets, you can hover on it to see the tips.

* Click the raido buttons of **Distribution Type** to switch the distribution function.
* Click the **+**/**-** button to add/reduce the component number you guess.
* **Observe Iteration**: Whether to display the iteration procedure.
* **Inherit Parameters**: Whether to inherit the parameters of last fitting. It will improve the accuracy and efficiency when the samples are continuous.
* **Auto Fit**: Whether to automaticlly fit after the sample data changed.
* **Auto Record**: Whether to automaticlly record the fitting result after fitting finished.
* Click the **Previous** button to back to the previous sample.
* Click the **Next** button to jump to the next sample.
* Click the **Auto Run Orderly** button to run the program automatically. The samples from current to the end will be processed one by one.
* Click the **Cancel** button to cancel the fitting progress.
* Click the **Try Fit** button to fit the current sample.
* Click the **Record** button to record the current fitting result.\nNote: It will record the LAST SUCCESS fitting result, NOT CURRENT SAMPLE.
* Click the **Multi Cores Fitting** button to fit all samples. It will utilize all cores of cpu to accelerate calculation.
* Move the lines in **Canvas** dock to set the expected mean values of each component, if it can not return a proper result and you make sure the component is correct.

## Workflow

The workflow of fitting samples is that:

1. Try fit one typic sample untill you are satisfied.

You can adjust the component number and watch the chart of fitting result to find a proper value.

If it can not return a correct result, you can check the **Ovserve Iteration** option to find the reason. Also, you can move the lines to test whether the component number is proper.

If it can return a proper result by giving the expected mean values, you can adjust the algorithm settings to refine the performance to let it can get the proper result automatically.

2. Test other samples with the component number.
3. If the component number are suitable for all samples, use auto fit to process them all.
4. If some results are not correct, cancel the fitting and return the step 1. If the incorrect results are not too many, you can fit and record manually.
5. Save the fitting results to file.
15 changes: 4 additions & 11 deletions docs/tutorials/load_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

In order to analyze the grain size distribution (GSD), the first necessary step is data loading. Click the **File** menu and select **Load** option to load GSD data from local files.

QGrain provides some built-in files of samples. Lookup the **samples** folder of QGrain to get them.

At present, QGrain supports three file formats, `.csv`, `.xls` (Excel 97-2003) and `.xlsx`.

Note: For `.xls` and `.xlsx`, please put the data table at the **FIRST** sheet. Otherwise, the data can not be loaded.

By default, QGrain assume that the data layout follows:

* The first row should be the headers (i.e. the classes of grain size).

* The following rows should be the distributions of samples under the grain size classes.

* The first column shoud be the name (i.e. id) of samples.

If your layout of data file are not same as this, it will raise exceptions.
Expand All @@ -23,13 +23,6 @@ If you do not want to modify your data file, you can click the **Settings** menu
There are 4 parameters that control the data loader. They all are row/column index which start with 0 (i.e. 0 means the first row/column).

* Class Row: The row index of grain size classes.

* Sample Name Column: The column index of sample names.

* Distribution Start Row: The start row index (starts with 0) of distribution data.

It should be greater than the row index of classes.

* Distribution Start Column: The start column index (starts with 0) of distribution data.

It should be greater than the column index of sample name.
* Distribution Start Row: The start row index (starts with 0) of distribution data. It should be greater than the row index of classes.
* Distribution Start Column: The start column index (starts with 0) of distribution data. It should be greater than the column index of sample name.
Loading

0 comments on commit 9ea2884

Please sign in to comment.