From 6dc4dfc5b9aac1aa46517ef142d64f954748d4c1 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 13:09:34 +0100
Subject: [PATCH 01/15] add getting-started section and remove basics chapter
---
docs/index.md | 5 -
docs/user-guide/basics/index.md | 18 ----
docs/user-guide/basics/joins.md | 26 -----
docs/user-guide/basics/reading-writing.md | 45 --------
.../expressions.md => getting-started.md} | 101 +++++++++++++++++-
docs/user-guide/index.md | 39 -------
docs/user-guide/overview.md | 53 +++++++++
mkdocs.yml | 13 +--
8 files changed, 158 insertions(+), 142 deletions(-)
delete mode 100644 docs/user-guide/basics/index.md
delete mode 100644 docs/user-guide/basics/joins.md
delete mode 100644 docs/user-guide/basics/reading-writing.md
rename docs/user-guide/{basics/expressions.md => getting-started.md} (52%)
delete mode 100644 docs/user-guide/index.md
create mode 100644 docs/user-guide/overview.md
diff --git a/docs/index.md b/docs/index.md
index 2c72f776edbb..cad8fc9b0322 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,8 +1,3 @@
----
-hide:
- - navigation
----
-
# Polars

diff --git a/docs/user-guide/basics/index.md b/docs/user-guide/basics/index.md
deleted file mode 100644
index af73c7967574..000000000000
--- a/docs/user-guide/basics/index.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# Introduction
-
-This chapter is intended for new Polars users.
-The goal is to provide a quick overview of the most common functionality.
-Feel free to skip ahead to the [next chapter](../concepts/data-types/overview.md) to dive into the details.
-
-!!! rust "Rust Users Only"
-
- Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md).
-
- To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars
- ```
- # Cargo.toml
- [dependencies]
- polars = { version = "x", features = ["lazy", ...]}
- ```
-
- Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`).
diff --git a/docs/user-guide/basics/joins.md b/docs/user-guide/basics/joins.md
deleted file mode 100644
index 21cb927164a9..000000000000
--- a/docs/user-guide/basics/joins.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# Combining DataFrames
-
-There are two ways `DataFrame`s can be combined depending on the use case: join and concat.
-
-## Join
-
-Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.
-
-{{code_block('user-guide/basics/joins','join',['join'])}}
-
-```python exec="on" result="text" session="getting-started/joins"
---8<-- "python/user-guide/basics/joins.py:setup"
---8<-- "python/user-guide/basics/joins.py:join"
-```
-
-To see more examples with other types of joins, go the [User Guide](../transformations/joins.md).
-
-## Concat
-
-We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.
-
-{{code_block('user-guide/basics/joins','hstack',['hstack'])}}
-
-```python exec="on" result="text" session="getting-started/joins"
---8<-- "python/user-guide/basics/joins.py:hstack"
-```
diff --git a/docs/user-guide/basics/reading-writing.md b/docs/user-guide/basics/reading-writing.md
deleted file mode 100644
index 8999f601e823..000000000000
--- a/docs/user-guide/basics/reading-writing.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# Reading & writing
-
-Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe
-
-{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:dataframe"
-```
-
-#### CSV
-
-Polars has its own fast implementation for csv reading with many flexible configuration options.
-
-{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:csv"
-```
-
-As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below:
-
-{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:csv2"
-```
-
-#### JSON
-
-{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:json"
-```
-
-#### Parquet
-
-{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:parquet"
-```
-
-To see more examples and other data formats go to the [User Guide](../io/csv.md), section IO.
diff --git a/docs/user-guide/basics/expressions.md b/docs/user-guide/getting-started.md
similarity index 52%
rename from docs/user-guide/basics/expressions.md
rename to docs/user-guide/getting-started.md
index 0277d3da72f6..52e9b078b3b7 100644
--- a/docs/user-guide/basics/expressions.md
+++ b/docs/user-guide/getting-started.md
@@ -1,13 +1,81 @@
-# Expressions
+# Getting started
+This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the [next chapter about installation options](installation.md).
-`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
+## Installing Polars
+
+=== ":fontawesome-brands-python: Python"
+
+ ``` bash
+ pip install polars
+ ```
+
+=== ":fontawesome-brands-rust: Rust"
+
+ ``` shell
+ cargo add polars -F lazy
+
+ # Or Cargo.toml
+ [dependencies]
+ polars = { version = "x", features = ["lazy", ...]}
+ ```
+
+## Reading & writing
+
+Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe
+
+{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:dataframe"
+```
+
+### CSV
+
+Polars has its own fast implementation for csv reading with many flexible configuration options.
+
+{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:csv"
+```
+
+As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below:
+
+{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:csv2"
+```
+
+### JSON
+
+{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:json"
+```
+
+### Parquet
+
+{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:parquet"
+```
+
+To see more examples and other data formats go to the [User Guide](io/csv.md), section IO.
+
+
+## Expressions
+
+`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
- `select`
- `filter`
- `with_columns`
- `group_by`
-To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md).
+To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](concepts/contexts.md) and [Expressions](concepts/expressions.md).
### Select statement
@@ -128,3 +196,30 @@ Below are some examples on how to combine operations to create the `DataFrame` y
```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/user-guide/basics/expressions.py:combine2"
```
+
+## Combining DataFrames
+
+There are two ways `DataFrame`s can be combined depending on the use case: join and concat.
+
+### Join
+
+Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.
+
+{{code_block('user-guide/basics/joins','join',['join'])}}
+
+```python exec="on" result="text" session="getting-started/joins"
+--8<-- "python/user-guide/basics/joins.py:setup"
+--8<-- "python/user-guide/basics/joins.py:join"
+```
+
+To see more examples with other types of joins, see the [Transformations section](transformations/joins.md) in the user guide.
+
+### Concat
+
+We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.
+
+{{code_block('user-guide/basics/joins','hstack',['hstack'])}}
+
+```python exec="on" result="text" session="getting-started/joins"
+--8<-- "python/user-guide/basics/joins.py:hstack"
+```
diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md
deleted file mode 100644
index 442029472d80..000000000000
--- a/docs/user-guide/index.md
+++ /dev/null
@@ -1,39 +0,0 @@
-# Introduction
-
-This user guide is an introduction to the [Polars DataFrame library](https://github.com/pola-rs/polars).
-Its goal is to introduce you to Polars by going through examples and comparing it to other solutions.
-Some design choices are introduced here. The guide will also introduce you to optimal usage of Polars.
-
-The Polars user guide is intended to live alongside the API documentation ([Python](https://docs.pola.rs/py-polars/html/reference/index.html) / [Rust](https://docs.rs/polars/latest/polars/)), which offers detailed descriptions of specific objects and functions.
-
-Even though Polars is completely written in [Rust](https://www.rust-lang.org/) (no runtime overhead!) and uses [Arrow](https://arrow.apache.org/) -- the [native arrow2 Rust implementation](https://github.com/jorgecarleitao/arrow2) -- as its foundation, the examples presented in this guide will be mostly using its higher-level language bindings.
-Higher-level bindings only serve as a thin wrapper for functionality implemented in the core library.
-
-For [pandas](https://pandas.pydata.org/) users, our [Python package](https://pypi.org/project/polars/) will offer the easiest way to get started with Polars.
-
-### Philosophy
-
-The goal of Polars is to provide a lightning fast `DataFrame` library that:
-
-- Utilizes all available cores on your machine.
-- Optimizes queries to reduce unneeded work/memory allocations.
-- Handles datasets much larger than your available RAM.
-- Has an API that is consistent and predictable.
-- Has a strict schema (data-types should be known before running the query).
-
-Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts
-in a query engine.
-
-As such Polars goes to great lengths to:
-
-- Reduce redundant copies.
-- Traverse memory cache efficiently.
-- Minimize contention in parallelism.
-- Process data in chunks.
-- Reuse memory allocations.
-
-!!! rust "Note"
-
- The Rust examples in this guide are synchronized with the main branch of the Polars repository, rather than the latest Rust release.
- You may not be able to copy-paste code examples and use them with the latest release.
- We aim to solve this in the future.
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
new file mode 100644
index 000000000000..3b480fcc8225
--- /dev/null
+++ b/docs/user-guide/overview.md
@@ -0,0 +1,53 @@
+# Overview
+
+
+
+
+
+Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
+
+- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
+- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
+- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
+- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
+- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
+- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
+
+## Performance :rocket: :rocket:
+
+Polars is very fast, and in fact is one of the best performing solutions available.
+See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
+
+Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
+
+## Example
+
+{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
+
+## Community
+
+Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
+
+--8<-- "docs/people.md"
+
+## Contributing
+
+We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](../development/contributing/index.md) to learn more.
+
+## License
+
+This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE).
diff --git a/mkdocs.yml b/mkdocs.yml
index 9918d5c2e8f3..bb3be85cb230 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -11,13 +11,14 @@ nav:
- Home: index.md
- User guide:
- - user-guide/index.md
+ - user-guide/overview.md
+ - user-guide/getting-started.md
- user-guide/installation.md
- - Basics:
- - user-guide/basics/index.md
- - user-guide/basics/reading-writing.md
- - user-guide/basics/expressions.md
- - user-guide/basics/joins.md
+ # - Basics:
+ # - user-guide/basics/index.md
+ # - user-guide/basics/reading-writing.md
+ # - user-guide/basics/expressions.md
+ # - user-guide/basics/joins.md
- Concepts:
- Data types:
- user-guide/concepts/data-types/overview.md
From 0eab18cb498ae5f66f145c8a6f25c3eb8cf5fd4c Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:12:19 +0100
Subject: [PATCH 02/15] rewrite from home to overview - keeping the basics in
place for now
---
docs/_build/overrides/404.html | 2 +-
docs/index.md | 53 ----------------------------------
docs/user-guide/overview.md | 33 ++++++++++++++-------
mkdocs.yml | 9 +-----
4 files changed, 25 insertions(+), 72 deletions(-)
delete mode 100644 docs/index.md
diff --git a/docs/_build/overrides/404.html b/docs/_build/overrides/404.html
index ee9b8faa2aba..a216b32dfc5f 100644
--- a/docs/_build/overrides/404.html
+++ b/docs/_build/overrides/404.html
@@ -217,6 +217,6 @@
404 - You're lost.
How you got here is a mystery. But you can click the button below
to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.
-
-Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
-
-- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
-- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
-- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
-- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
-- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
-- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
-
-## Performance :rocket: :rocket:
-
-Polars is very fast, and in fact is one of the best performing solutions available.
-See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
-
-Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
-
-## Example
-
-{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
-
-## Community
-
-Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
-
---8<-- "docs/people.md"
-
-## Contributing
-
-We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more.
-
-## License
-
-This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE).
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index 3b480fcc8225..77340e013e89 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -18,26 +18,39 @@
-Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
+Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.
-- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
+## Key features
+- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies.
- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
-- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
-- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
-- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
-- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
+- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
+- **Out of Core**: The streaming API allows you to process your results without requiring all your data to be in memory at the same time
+- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
+- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
-## Performance :rocket: :rocket:
-Polars is very fast, and in fact is one of the best performing solutions available.
-See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
+!!! info "Users new to Dataframes"
+ A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
-Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
+
+## Philosophy
+
+The goal of Polars is to provide a lightning fast DataFrame library that:
+
+- Utilizes all available cores on your machine.
+- Optimizes queries to reduce unneeded work/memory allocations.
+- Handles datasets much larger than your available RAM.
+- A consistent and predictable API.
+- Strict schema (data-types should be known before running the query).
+
+Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.
## Example
{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
+A more extensive introduction can be found in the [next chapter](/user-guide/getting-started).
+
## Community
Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
diff --git a/mkdocs.yml b/mkdocs.yml
index bb3be85cb230..1eadf554f927 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -2,23 +2,16 @@
# Project information
site_name: Polars
-site_url: https://docs.pola.rs
+site_url: https://docs.pola.rs/
repo_url: https://github.com/pola-rs/polars
repo_name: pola-rs/polars
# Documentation layout
nav:
- - Home: index.md
-
- User guide:
- user-guide/overview.md
- user-guide/getting-started.md
- user-guide/installation.md
- # - Basics:
- # - user-guide/basics/index.md
- # - user-guide/basics/reading-writing.md
- # - user-guide/basics/expressions.md
- # - user-guide/basics/joins.md
- Concepts:
- Data types:
- user-guide/concepts/data-types/overview.md
From 80a9c06e9be4985608edde362afd8e2d419cfd23 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:15:58 +0100
Subject: [PATCH 03/15] improve accessibility of page
---
docs/user-guide/overview.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index 77340e013e89..bcc243cbf3f5 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -5,10 +5,10 @@