From 5b45cfc39296302ee73a0d9f59af6cc8d9f3cd84 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 13:09:34 +0100
Subject: [PATCH 01/12] add getting-started section and remove basics chapter
---
docs/index.md | 5 -
docs/user-guide/basics/index.md | 18 ----
docs/user-guide/basics/joins.md | 26 -----
docs/user-guide/basics/reading-writing.md | 45 --------
.../expressions.md => getting-started.md} | 101 +++++++++++++++++-
docs/user-guide/index.md | 39 -------
docs/user-guide/overview.md | 53 +++++++++
mkdocs.yml | 13 +--
8 files changed, 158 insertions(+), 142 deletions(-)
delete mode 100644 docs/user-guide/basics/index.md
delete mode 100644 docs/user-guide/basics/joins.md
delete mode 100644 docs/user-guide/basics/reading-writing.md
rename docs/user-guide/{basics/expressions.md => getting-started.md} (52%)
delete mode 100644 docs/user-guide/index.md
create mode 100644 docs/user-guide/overview.md
diff --git a/docs/index.md b/docs/index.md
index 2c72f776edbb..cad8fc9b0322 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,8 +1,3 @@
----
-hide:
- - navigation
----
-
# Polars

diff --git a/docs/user-guide/basics/index.md b/docs/user-guide/basics/index.md
deleted file mode 100644
index af73c7967574..000000000000
--- a/docs/user-guide/basics/index.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# Introduction
-
-This chapter is intended for new Polars users.
-The goal is to provide a quick overview of the most common functionality.
-Feel free to skip ahead to the [next chapter](../concepts/data-types/overview.md) to dive into the details.
-
-!!! rust "Rust Users Only"
-
- Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md).
-
- To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars
- ```
- # Cargo.toml
- [dependencies]
- polars = { version = "x", features = ["lazy", ...]}
- ```
-
- Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`).
diff --git a/docs/user-guide/basics/joins.md b/docs/user-guide/basics/joins.md
deleted file mode 100644
index 21cb927164a9..000000000000
--- a/docs/user-guide/basics/joins.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# Combining DataFrames
-
-There are two ways `DataFrame`s can be combined depending on the use case: join and concat.
-
-## Join
-
-Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.
-
-{{code_block('user-guide/basics/joins','join',['join'])}}
-
-```python exec="on" result="text" session="getting-started/joins"
---8<-- "python/user-guide/basics/joins.py:setup"
---8<-- "python/user-guide/basics/joins.py:join"
-```
-
-To see more examples with other types of joins, go the [User Guide](../transformations/joins.md).
-
-## Concat
-
-We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.
-
-{{code_block('user-guide/basics/joins','hstack',['hstack'])}}
-
-```python exec="on" result="text" session="getting-started/joins"
---8<-- "python/user-guide/basics/joins.py:hstack"
-```
diff --git a/docs/user-guide/basics/reading-writing.md b/docs/user-guide/basics/reading-writing.md
deleted file mode 100644
index 8999f601e823..000000000000
--- a/docs/user-guide/basics/reading-writing.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# Reading & writing
-
-Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe
-
-{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:dataframe"
-```
-
-#### CSV
-
-Polars has its own fast implementation for csv reading with many flexible configuration options.
-
-{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:csv"
-```
-
-As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below:
-
-{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:csv2"
-```
-
-#### JSON
-
-{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:json"
-```
-
-#### Parquet
-
-{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:parquet"
-```
-
-To see more examples and other data formats go to the [User Guide](../io/csv.md), section IO.
diff --git a/docs/user-guide/basics/expressions.md b/docs/user-guide/getting-started.md
similarity index 52%
rename from docs/user-guide/basics/expressions.md
rename to docs/user-guide/getting-started.md
index 0277d3da72f6..52e9b078b3b7 100644
--- a/docs/user-guide/basics/expressions.md
+++ b/docs/user-guide/getting-started.md
@@ -1,13 +1,81 @@
-# Expressions
+# Getting started
+This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the [next chapter about installation options](installation.md).
-`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
+## Installing Polars
+
+=== ":fontawesome-brands-python: Python"
+
+ ``` bash
+ pip install polars
+ ```
+
+=== ":fontawesome-brands-rust: Rust"
+
+ ``` shell
+ cargo add polars -F lazy
+
+ # Or Cargo.toml
+ [dependencies]
+ polars = { version = "x", features = ["lazy", ...]}
+ ```
+
+## Reading & writing
+
+Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe
+
+{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:dataframe"
+```
+
+### CSV
+
+Polars has its own fast implementation for csv reading with many flexible configuration options.
+
+{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:csv"
+```
+
+As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below:
+
+{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:csv2"
+```
+
+### JSON
+
+{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:json"
+```
+
+### Parquet
+
+{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}}
+
+```python exec="on" result="text" session="getting-started/reading"
+--8<-- "python/user-guide/basics/reading-writing.py:parquet"
+```
+
+To see more examples and other data formats go to the [User Guide](io/csv.md), section IO.
+
+
+## Expressions
+
+`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
- `select`
- `filter`
- `with_columns`
- `group_by`
-To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md).
+To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](concepts/contexts.md) and [Expressions](concepts/expressions.md).
### Select statement
@@ -128,3 +196,30 @@ Below are some examples on how to combine operations to create the `DataFrame` y
```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/user-guide/basics/expressions.py:combine2"
```
+
+## Combining DataFrames
+
+There are two ways `DataFrame`s can be combined depending on the use case: join and concat.
+
+### Join
+
+Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.
+
+{{code_block('user-guide/basics/joins','join',['join'])}}
+
+```python exec="on" result="text" session="getting-started/joins"
+--8<-- "python/user-guide/basics/joins.py:setup"
+--8<-- "python/user-guide/basics/joins.py:join"
+```
+
+To see more examples with other types of joins, see the [Transformations section](transformations/joins.md) in the user guide.
+
+### Concat
+
+We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.
+
+{{code_block('user-guide/basics/joins','hstack',['hstack'])}}
+
+```python exec="on" result="text" session="getting-started/joins"
+--8<-- "python/user-guide/basics/joins.py:hstack"
+```
diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md
deleted file mode 100644
index 442029472d80..000000000000
--- a/docs/user-guide/index.md
+++ /dev/null
@@ -1,39 +0,0 @@
-# Introduction
-
-This user guide is an introduction to the [Polars DataFrame library](https://github.com/pola-rs/polars).
-Its goal is to introduce you to Polars by going through examples and comparing it to other solutions.
-Some design choices are introduced here. The guide will also introduce you to optimal usage of Polars.
-
-The Polars user guide is intended to live alongside the API documentation ([Python](https://docs.pola.rs/py-polars/html/reference/index.html) / [Rust](https://docs.rs/polars/latest/polars/)), which offers detailed descriptions of specific objects and functions.
-
-Even though Polars is completely written in [Rust](https://www.rust-lang.org/) (no runtime overhead!) and uses [Arrow](https://arrow.apache.org/) -- the [native arrow2 Rust implementation](https://github.com/jorgecarleitao/arrow2) -- as its foundation, the examples presented in this guide will be mostly using its higher-level language bindings.
-Higher-level bindings only serve as a thin wrapper for functionality implemented in the core library.
-
-For [pandas](https://pandas.pydata.org/) users, our [Python package](https://pypi.org/project/polars/) will offer the easiest way to get started with Polars.
-
-### Philosophy
-
-The goal of Polars is to provide a lightning fast `DataFrame` library that:
-
-- Utilizes all available cores on your machine.
-- Optimizes queries to reduce unneeded work/memory allocations.
-- Handles datasets much larger than your available RAM.
-- Has an API that is consistent and predictable.
-- Has a strict schema (data-types should be known before running the query).
-
-Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts
-in a query engine.
-
-As such Polars goes to great lengths to:
-
-- Reduce redundant copies.
-- Traverse memory cache efficiently.
-- Minimize contention in parallelism.
-- Process data in chunks.
-- Reuse memory allocations.
-
-!!! rust "Note"
-
- The Rust examples in this guide are synchronized with the main branch of the Polars repository, rather than the latest Rust release.
- You may not be able to copy-paste code examples and use them with the latest release.
- We aim to solve this in the future.
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
new file mode 100644
index 000000000000..3b480fcc8225
--- /dev/null
+++ b/docs/user-guide/overview.md
@@ -0,0 +1,53 @@
+# Overview
+
+
+
+Blazingly Fast DataFrame Library
+
+
+Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
+
+- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
+- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
+- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
+- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
+- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
+- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
+
+## Performance :rocket: :rocket:
+
+Polars is very fast, and in fact is one of the best performing solutions available.
+See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
+
+Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
+
+## Example
+
+{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
+
+## Community
+
+Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
+
+--8<-- "docs/people.md"
+
+## Contributing
+
+We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](../development/contributing/index.md) to learn more.
+
+## License
+
+This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE).
diff --git a/mkdocs.yml b/mkdocs.yml
index c4b11d02a371..1468afdf8a2d 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -11,13 +11,14 @@ nav:
- Home: index.md
- User guide:
- - user-guide/index.md
+ - user-guide/overview.md
+ - user-guide/getting-started.md
- user-guide/installation.md
- - Basics:
- - user-guide/basics/index.md
- - user-guide/basics/reading-writing.md
- - user-guide/basics/expressions.md
- - user-guide/basics/joins.md
+ # - Basics:
+ # - user-guide/basics/index.md
+ # - user-guide/basics/reading-writing.md
+ # - user-guide/basics/expressions.md
+ # - user-guide/basics/joins.md
- Concepts:
- Data types:
- user-guide/concepts/data-types/overview.md
From 88e452d657e29f28b96c8f1e90459824ae8d9b53 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:12:19 +0100
Subject: [PATCH 02/12] rewrite from home to overview - keeping the basics in
place for now
---
docs/_build/overrides/404.html | 2 +-
docs/index.md | 53 ----------------------------------
docs/user-guide/overview.md | 33 ++++++++++++++-------
mkdocs.yml | 9 +-----
4 files changed, 25 insertions(+), 72 deletions(-)
delete mode 100644 docs/index.md
diff --git a/docs/_build/overrides/404.html b/docs/_build/overrides/404.html
index ee9b8faa2aba..a216b32dfc5f 100644
--- a/docs/_build/overrides/404.html
+++ b/docs/_build/overrides/404.html
@@ -217,6 +217,6 @@ 404 - You're lost.
How you got here is a mystery. But you can click the button below
to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.
- Home
+ Home
{% endblock %}
diff --git a/docs/index.md b/docs/index.md
deleted file mode 100644
index cad8fc9b0322..000000000000
--- a/docs/index.md
+++ /dev/null
@@ -1,53 +0,0 @@
-# Polars
-
-
-
-Blazingly Fast DataFrame Library
-
-
-Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
-
-- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
-- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
-- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
-- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
-- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
-- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
-
-## Performance :rocket: :rocket:
-
-Polars is very fast, and in fact is one of the best performing solutions available.
-See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
-
-Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
-
-## Example
-
-{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
-
-## Community
-
-Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
-
---8<-- "docs/people.md"
-
-## Contributing
-
-We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more.
-
-## License
-
-This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE).
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index 3b480fcc8225..77340e013e89 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -18,26 +18,39 @@
-Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
+Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.
-- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
+## Key features
+- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies.
- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
-- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
-- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
-- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
-- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
+- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
+- **Out of Core**: The streaming API allows you to process your results without requiring all your data to be in memory at the same time
+- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
+- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
-## Performance :rocket: :rocket:
-Polars is very fast, and in fact is one of the best performing solutions available.
-See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
+!!! info "Users new to Dataframes"
+ A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
-Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
+
+## Philosophy
+
+The goal of Polars is to provide a lightning fast DataFrame library that:
+
+- Utilizes all available cores on your machine.
+- Optimizes queries to reduce unneeded work/memory allocations.
+- Handles datasets much larger than your available RAM.
+- A consistent and predictable API.
+- Strict schema (data-types should be known before running the query).
+
+Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.
## Example
{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
+A more extensive introduction can be found in the [next chapter](/user-guide/getting-started).
+
## Community
Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
diff --git a/mkdocs.yml b/mkdocs.yml
index 1468afdf8a2d..61cec2cd8a66 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -2,23 +2,16 @@
# Project information
site_name: Polars
-site_url: https://docs.pola.rs
+site_url: https://docs.pola.rs/
repo_url: https://github.com/pola-rs/polars
repo_name: pola-rs/polars
# Documentation layout
nav:
- - Home: index.md
-
- User guide:
- user-guide/overview.md
- user-guide/getting-started.md
- user-guide/installation.md
- # - Basics:
- # - user-guide/basics/index.md
- # - user-guide/basics/reading-writing.md
- # - user-guide/basics/expressions.md
- # - user-guide/basics/joins.md
- Concepts:
- Data types:
- user-guide/concepts/data-types/overview.md
From c9f0b54c73f17cc65eca8094ee6333fb1a04b6d1 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:15:58 +0100
Subject: [PATCH 03/12] improve accessibility of page
---
docs/user-guide/overview.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index 77340e013e89..bcc243cbf3f5 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -5,10 +5,10 @@
Blazingly Fast DataFrame Library
-
+
-
+
From ae7c022686f39b8175ca0747fa4dc0deb0303086 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:20:13 +0100
Subject: [PATCH 04/12] minor textual changes
---
docs/api/index.md | 2 +-
docs/user-guide/overview.md | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/api/index.md b/docs/api/index.md
index 004799cae1b4..485b59923ad1 100644
--- a/docs/api/index.md
+++ b/docs/api/index.md
@@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function.
## Python
The Python API reference is built using Sphinx.
-It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html).
+It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html).
## Rust
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index bcc243cbf3f5..bd69a7a9c75b 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -41,7 +41,7 @@ The goal of Polars is to provide a lightning fast DataFrame library that:
- Optimizes queries to reduce unneeded work/memory allocations.
- Handles datasets much larger than your available RAM.
- A consistent and predictable API.
-- Strict schema (data-types should be known before running the query).
+- Adheres to a strict schema (data-types should be known before running the query).
Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.
From 863d711bffad20d33734bc9a353ee51ffae6aa70 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:23:45 +0100
Subject: [PATCH 05/12] formatting
---
docs/user-guide/getting-started.md | 2 +-
docs/user-guide/overview.md | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md
index 52e9b078b3b7..fca8a951089e 100644
--- a/docs/user-guide/getting-started.md
+++ b/docs/user-guide/getting-started.md
@@ -1,4 +1,5 @@
# Getting started
+
This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the [next chapter about installation options](installation.md).
## Installing Polars
@@ -65,7 +66,6 @@ As we can see above, Polars made the datetimes a `string`. We can tell Polars to
To see more examples and other data formats go to the [User Guide](io/csv.md), section IO.
-
## Expressions
`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index bd69a7a9c75b..41bb32795209 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -21,6 +21,7 @@
Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.
## Key features
+
- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies.
- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
@@ -28,10 +29,8 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. T
- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
-
!!! info "Users new to Dataframes"
- A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
-
+A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
## Philosophy
From cf68c9bc477c57f34fda842e83496dd98b75a039 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:28:34 +0100
Subject: [PATCH 06/12] logo click goes to user guide overview now
---
mkdocs.yml | 1 +
1 file changed, 1 insertion(+)
diff --git a/mkdocs.yml b/mkdocs.yml
index 61cec2cd8a66..7edb55beb98e 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -137,6 +137,7 @@ extra:
analytics:
provider: plausible
domain: guide.pola.rs,combined.pola.rs
+ homepage: https://docs.pola.rs/user-guide/overview/
# Preview controls
strict: true
From 93d0bd24cb2233054a07befbcef318a2024e50db Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 14:31:26 +0100
Subject: [PATCH 07/12] dprint fmt breaks info box
---
docs/user-guide/overview.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index 41bb32795209..50dda6be1e77 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -29,8 +29,8 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. T
- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
-!!! info "Users new to Dataframes"
-A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
+!!! info "Users new to DataFrames"
+ A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
## Philosophy
From 06547fea31731bcdb405f769a06f64f594c490d6 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Thu, 11 Jan 2024 15:26:09 +0100
Subject: [PATCH 08/12] fix link
---
docs/user-guide/overview.md | 6 +++++-
mkdocs.yml | 2 +-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md
index 50dda6be1e77..f76eb0e1a1d3 100644
--- a/docs/user-guide/overview.md
+++ b/docs/user-guide/overview.md
@@ -29,9 +29,13 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. T
- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
+
+
!!! info "Users new to DataFrames"
A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
+
+
## Philosophy
The goal of Polars is to provide a lightning fast DataFrame library that:
@@ -48,7 +52,7 @@ Polars is written in Rust which gives it C/C++ performance and allows it to full
{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
-A more extensive introduction can be found in the [next chapter](/user-guide/getting-started).
+A more extensive introduction can be found in the [next chapter](getting-started.md).
## Community
diff --git a/mkdocs.yml b/mkdocs.yml
index 7edb55beb98e..1d1ed4a769d8 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -137,7 +137,7 @@ extra:
analytics:
provider: plausible
domain: guide.pola.rs,combined.pola.rs
- homepage: https://docs.pola.rs/user-guide/overview/
+ homepage: /user-guide/overview/
# Preview controls
strict: true
From 478da66c77d2add59b180e82150acd8364f86d17 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Tue, 16 Jan 2024 12:52:12 +0100
Subject: [PATCH 09/12] getting started more to the point and index pages to
support
---
.../python/user-guide/basics/expressions.py | 23 +++----
.../user-guide/basics/reading-writing.py | 7 ++-
.../src/rust/user-guide/basics/expressions.rs | 25 ++++----
.../rust/user-guide/basics/reading-writing.rs | 6 +-
docs/user-guide/concepts/index.md | 12 ++++
docs/user-guide/expressions/index.md | 18 ++++++
docs/user-guide/getting-started.md | 61 ++++---------------
docs/user-guide/io/index.md | 12 ++++
docs/user-guide/lazy/index.md | 10 +++
docs/user-guide/transformations/index.md | 8 +++
mkdocs.yml | 6 ++
11 files changed, 106 insertions(+), 82 deletions(-)
create mode 100644 docs/user-guide/concepts/index.md
create mode 100644 docs/user-guide/expressions/index.md
create mode 100644 docs/user-guide/io/index.md
create mode 100644 docs/user-guide/lazy/index.md
create mode 100644 docs/user-guide/transformations/index.md
diff --git a/docs/src/python/user-guide/basics/expressions.py b/docs/src/python/user-guide/basics/expressions.py
index 451cf83441f0..590c8db1688d 100644
--- a/docs/src/python/user-guide/basics/expressions.py
+++ b/docs/src/python/user-guide/basics/expressions.py
@@ -6,19 +6,16 @@
df = pl.DataFrame(
{
- "a": range(8),
- "b": np.random.rand(8),
+ "a": range(5),
+ "b": np.random.rand(5),
"c": [
- datetime(2022, 12, 1),
- datetime(2022, 12, 2),
- datetime(2022, 12, 3),
- datetime(2022, 12, 4),
- datetime(2022, 12, 5),
- datetime(2022, 12, 6),
- datetime(2022, 12, 7),
- datetime(2022, 12, 8),
+ datetime(2025, 12, 1),
+ datetime(2025, 12, 2),
+ datetime(2025, 12, 3),
+ datetime(2025, 12, 4),
+ datetime(2025, 12, 5),
],
- "d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None],
+ "d": [1, 2.0, float("nan"), -42, None],
}
)
# --8<-- [end:setup]
@@ -36,12 +33,12 @@
# --8<-- [end:select3]
# --8<-- [start:exclude]
-df.select(pl.exclude("a"))
+df.select(pl.exclude(["a", "c"]))
# --8<-- [end:exclude]
# --8<-- [start:filter]
df.filter(
- pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)),
+ pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
)
# --8<-- [end:filter]
diff --git a/docs/src/python/user-guide/basics/reading-writing.py b/docs/src/python/user-guide/basics/reading-writing.py
index dc8a54ebd18f..f01fbba3fb30 100644
--- a/docs/src/python/user-guide/basics/reading-writing.py
+++ b/docs/src/python/user-guide/basics/reading-writing.py
@@ -6,11 +6,12 @@
{
"integer": [1, 2, 3],
"date": [
- datetime(2022, 1, 1),
- datetime(2022, 1, 2),
- datetime(2022, 1, 3),
+ datetime(2025, 1, 1),
+ datetime(2025, 1, 2),
+ datetime(2025, 1, 3),
],
"float": [4.0, 5.0, 6.0],
+ "string": ["a", "b", "c"]
}
)
diff --git a/docs/src/rust/user-guide/basics/expressions.rs b/docs/src/rust/user-guide/basics/expressions.rs
index ac36b45f459a..59e5c9338add 100644
--- a/docs/src/rust/user-guide/basics/expressions.rs
+++ b/docs/src/rust/user-guide/basics/expressions.rs
@@ -6,19 +6,16 @@ fn main() -> Result<(), Box> {
let mut rng = rand::thread_rng();
let df: DataFrame = df!(
- "a" => 0..8,
- "b"=> (0..8).map(|_| rng.gen::()).collect::>(),
+ "a" => 0..5,
+ "b"=> (0..5).map(|_| rng.gen::()).collect::>(),
"c"=> [
- NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
- "d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None]
+ "d"=> [Some(1.0), Some(2.0), None, Some(-42.), None]
)
.unwrap();
@@ -46,17 +43,17 @@ fn main() -> Result<(), Box> {
let out = df
.clone()
.lazy()
- .select([col("*").exclude(["a"])])
+ .select([col("*").exclude(["a", "c"])])
.collect()?;
println!("{}", out);
// --8<-- [end:exclude]
// --8<-- [start:filter]
- let start_date = NaiveDate::from_ymd_opt(2022, 12, 2)
+ let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
- let end_date = NaiveDate::from_ymd_opt(2022, 12, 8)
+ let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
diff --git a/docs/src/rust/user-guide/basics/reading-writing.rs b/docs/src/rust/user-guide/basics/reading-writing.rs
index 44c1a335428d..dad5e8713d24 100644
--- a/docs/src/rust/user-guide/basics/reading-writing.rs
+++ b/docs/src/rust/user-guide/basics/reading-writing.rs
@@ -9,9 +9,9 @@ fn main() -> Result<(), Box> {
let mut df: DataFrame = df!(
"integer" => &[1, 2, 3],
"date" => &[
- NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
- NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+ NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
"float" => &[4.0, 5.0, 6.0]
)
diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md
new file mode 100644
index 000000000000..62dbd19fce2f
--- /dev/null
+++ b/docs/user-guide/concepts/index.md
@@ -0,0 +1,12 @@
+# Concepts
+
+The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics:
+
+- Data types:
+ - [Overview](/data-types/overview.md)
+ - [Categoricals](data-types/categoricals.md)
+- [Data structures](data-structures.md)
+- [Contexts](contexts.md)
+- [Expressions](expressions.md)
+- [Lazy vs eager](lazy-vs-eager.md)
+- [Streaming](streaming.md)
\ No newline at end of file
diff --git a/docs/user-guide/expressions/index.md b/docs/user-guide/expressions/index.md
new file mode 100644
index 000000000000..c7e06cbe1863
--- /dev/null
+++ b/docs/user-guide/expressions/index.md
@@ -0,0 +1,18 @@
+# Expressions
+
+In the `Contexts` sections we outlined what `Expressions` are and how they are invaluable. In this section we will focus on the `Expressions` themselves. Each section gives an overview of what they do and provide additional examples.
+
+- [Operators](operators.md)
+- [Column selections](column-selections.md)
+- [Functions](functions.md)
+- [Casting](casting.md)
+- [Strings](strings.md)
+- [Aggregation](aggregation.md)
+- [Null](null.md)
+- [Window](window.md)
+- [Folds](folds.md)
+- [Lists](lists.md)
+- [Plugins](plugins.md)
+- [User-defined functions](user-defined-functions.md)
+- [Structs](structs.md)
+- [Numpy](numpy.md)
\ No newline at end of file
diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md
index fca8a951089e..0d5cee89fccd 100644
--- a/docs/user-guide/getting-started.md
+++ b/docs/user-guide/getting-started.md
@@ -22,7 +22,7 @@ This chapter is here to help you get started with Polars. It covers all the fund
## Reading & writing
-Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe
+Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we use csv as example to demonstrate foundational read/write operations.
{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}
@@ -30,9 +30,9 @@ Polars supports reading and writing to all common files (e.g. csv, json, parquet
--8<-- "python/user-guide/basics/reading-writing.py:dataframe"
```
-### CSV
+### CSV example
-Polars has its own fast implementation for csv reading with many flexible configuration options.
+In this example we write the DataFrame to `output.csv`. After that we can read it back with `read_csv` and `print` the result for inspection.
{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}
@@ -40,31 +40,7 @@ Polars has its own fast implementation for csv reading with many flexible config
--8<-- "python/user-guide/basics/reading-writing.py:csv"
```
-As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below:
-
-{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:csv2"
-```
-
-### JSON
-
-{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:json"
-```
-
-### Parquet
-
-{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}}
-
-```python exec="on" result="text" session="getting-started/reading"
---8<-- "python/user-guide/basics/reading-writing.py:parquet"
-```
-
-To see more examples and other data formats go to the [User Guide](io/csv.md), section IO.
+For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide.
## Expressions
@@ -79,7 +55,12 @@ To learn more about expressions and the context in which they operate, see the U
### Select statement
-To select a column we need to do two things. Define the `DataFrame` we want the data from. And second, select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns.
+To select a column we need to do two things:
+
+1. Define the `DataFrame` we want the data from.
+2. Select the data that we need.
+
+In the example below you see that we select `col('*')`. The asterisk stands for all columns.
{{code_block('user-guide/basics/expressions','select',['select'])}}
@@ -100,25 +81,7 @@ print(
)
```
-The second option is to specify each column using `pl.col`. This option is shown below.
-
-{{code_block('user-guide/basics/expressions','select3',['select'])}}
-
-```python exec="on" result="text" session="getting-started/expressions"
-print(
- --8<-- "python/user-guide/basics/expressions.py:select3"
-)
-```
-
-If you want to exclude an entire column from your view, you can simply use `exclude` in your `select` statement.
-
-{{code_block('user-guide/basics/expressions','exclude',['select'])}}
-
-```python exec="on" result="text" session="getting-started/expressions"
-print(
- --8<-- "python/user-guide/basics/expressions.py:exclude"
-)
-```
+Follow these links to other parts of the User guide to learn more about [basic operations](expressions/operators.md) or [column selections](expressions/column-selections.md).
### Filter
@@ -154,7 +117,7 @@ print(
)
```
-### Group by
+### Group_by
We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by.
diff --git a/docs/user-guide/io/index.md b/docs/user-guide/io/index.md
new file mode 100644
index 000000000000..5b0f35ff676f
--- /dev/null
+++ b/docs/user-guide/io/index.md
@@ -0,0 +1,12 @@
+# IO
+
+Reading and writing your data is crucial for a DataFrame library. In this chapter you will learn more on how to read and write to different file formats that are supported by Polars.
+
+- [CSV](csv.md)
+- [Excel](excel.md)
+- [Parquet](parquet.md)
+- [Json](json.md)
+- [Multiple](multiple.md)
+- [Database](database.md)
+- [Cloud storage](cloud-storage.md)
+- [Google Big Query](bigquery.md)
\ No newline at end of file
diff --git a/docs/user-guide/lazy/index.md b/docs/user-guide/lazy/index.md
new file mode 100644
index 000000000000..8efc3b0fbb50
--- /dev/null
+++ b/docs/user-guide/lazy/index.md
@@ -0,0 +1,10 @@
+# Lazy
+
+The Lazy chapter is a guide for working with `LazyFrames`. It covers the functionalities like how to use it and how to optimise it. You can also find more information about the query plan or gain more insight in the streaming capabilities.
+
+- [Using lazy API](using.md)
+- [Optimisations](optimizations.md)
+- [Schemas](schemas.md)
+- [Query plan](query-plan.md)
+- [Execution](execution.md)
+- [Streaming](streaming.md)
\ No newline at end of file
diff --git a/docs/user-guide/transformations/index.md b/docs/user-guide/transformations/index.md
new file mode 100644
index 000000000000..eaf5cb9ca5f6
--- /dev/null
+++ b/docs/user-guide/transformations/index.md
@@ -0,0 +1,8 @@
+# Transformations
+
+The focus of this section is to describe different types of data transformations and provide some examples on how to use them.
+
+- [Joins](joins.md)
+- [Concatenation](concatenation.md)
+- [Pivot](pivot.md)
+- [Melt](melt.md)
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 1d1ed4a769d8..5d456cab1c72 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -13,6 +13,7 @@ nav:
- user-guide/getting-started.md
- user-guide/installation.md
- Concepts:
+ - user-guide/concepts/index.md
- Data types:
- user-guide/concepts/data-types/overview.md
- user-guide/concepts/data-types/categoricals.md
@@ -22,6 +23,7 @@ nav:
- user-guide/concepts/lazy-vs-eager.md
- user-guide/concepts/streaming.md
- Expressions:
+ - user-guide/expressions/index.md
- user-guide/expressions/operators.md
- user-guide/expressions/column-selections.md
- user-guide/expressions/functions.md
@@ -37,6 +39,7 @@ nav:
- user-guide/expressions/structs.md
- user-guide/expressions/numpy.md
- Transformations:
+ - user-guide/transformations/index.md
- user-guide/transformations/joins.md
- user-guide/transformations/concatenation.md
- user-guide/transformations/pivot.md
@@ -48,6 +51,7 @@ nav:
- user-guide/transformations/time-series/resampling.md
- user-guide/transformations/time-series/timezones.md
- Lazy API:
+ - user-guide/lazy/index.md
- user-guide/lazy/using.md
- user-guide/lazy/optimizations.md
- user-guide/lazy/schemas.md
@@ -55,6 +59,7 @@ nav:
- user-guide/lazy/execution.md
- user-guide/lazy/streaming.md
- IO:
+ - user-guide/io/index.md
- user-guide/io/csv.md
- user-guide/io/excel.md
- user-guide/io/parquet.md
@@ -127,6 +132,7 @@ theme:
- navigation.tabs
- navigation.tabs.sticky
- navigation.footer
+ - navigation.indexes
- content.tabs.link
icon:
repo: fontawesome/brands/github
From dd359dc081108a374847fc50d9781bb9b9c209ac Mon Sep 17 00:00:00 2001
From: r-brink
Date: Tue, 16 Jan 2024 13:00:09 +0100
Subject: [PATCH 10/12] formatting
---
docs/user-guide/concepts/index.md | 6 +++---
docs/user-guide/expressions/index.md | 2 +-
docs/user-guide/getting-started.md | 8 ++++----
docs/user-guide/io/index.md | 2 +-
docs/user-guide/lazy/index.md | 2 +-
docs/user-guide/transformations/index.md | 2 +-
6 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md
index 62dbd19fce2f..e3eac9b2c70f 100644
--- a/docs/user-guide/concepts/index.md
+++ b/docs/user-guide/concepts/index.md
@@ -3,10 +3,10 @@
The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics:
- Data types:
- - [Overview](/data-types/overview.md)
- - [Categoricals](data-types/categoricals.md)
+ - [Overview](data-types/overview.md)
+ - [Categoricals](data-types/categoricals.md)
- [Data structures](data-structures.md)
- [Contexts](contexts.md)
- [Expressions](expressions.md)
- [Lazy vs eager](lazy-vs-eager.md)
-- [Streaming](streaming.md)
\ No newline at end of file
+- [Streaming](streaming.md)
diff --git a/docs/user-guide/expressions/index.md b/docs/user-guide/expressions/index.md
index c7e06cbe1863..3724e09ce15e 100644
--- a/docs/user-guide/expressions/index.md
+++ b/docs/user-guide/expressions/index.md
@@ -15,4 +15,4 @@ In the `Contexts` sections we outlined what `Expressions` are and how they are i
- [Plugins](plugins.md)
- [User-defined functions](user-defined-functions.md)
- [Structs](structs.md)
-- [Numpy](numpy.md)
\ No newline at end of file
+- [Numpy](numpy.md)
diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md
index 0d5cee89fccd..07fa84e3a479 100644
--- a/docs/user-guide/getting-started.md
+++ b/docs/user-guide/getting-started.md
@@ -40,7 +40,7 @@ In this example we write the DataFrame to `output.csv`. After that we can read i
--8<-- "python/user-guide/basics/reading-writing.py:csv"
```
-For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide.
+For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide.
## Expressions
@@ -55,10 +55,10 @@ To learn more about expressions and the context in which they operate, see the U
### Select statement
-To select a column we need to do two things:
+To select a column we need to do two things:
-1. Define the `DataFrame` we want the data from.
-2. Select the data that we need.
+1. Define the `DataFrame` we want the data from.
+2. Select the data that we need.
In the example below you see that we select `col('*')`. The asterisk stands for all columns.
diff --git a/docs/user-guide/io/index.md b/docs/user-guide/io/index.md
index 5b0f35ff676f..5a3548871e8a 100644
--- a/docs/user-guide/io/index.md
+++ b/docs/user-guide/io/index.md
@@ -9,4 +9,4 @@ Reading and writing your data is crucial for a DataFrame library. In this chapte
- [Multiple](multiple.md)
- [Database](database.md)
- [Cloud storage](cloud-storage.md)
-- [Google Big Query](bigquery.md)
\ No newline at end of file
+- [Google Big Query](bigquery.md)
diff --git a/docs/user-guide/lazy/index.md b/docs/user-guide/lazy/index.md
index 8efc3b0fbb50..be731390f09c 100644
--- a/docs/user-guide/lazy/index.md
+++ b/docs/user-guide/lazy/index.md
@@ -7,4 +7,4 @@ The Lazy chapter is a guide for working with `LazyFrames`. It covers the functio
- [Schemas](schemas.md)
- [Query plan](query-plan.md)
- [Execution](execution.md)
-- [Streaming](streaming.md)
\ No newline at end of file
+- [Streaming](streaming.md)
diff --git a/docs/user-guide/transformations/index.md b/docs/user-guide/transformations/index.md
index eaf5cb9ca5f6..cd673786643c 100644
--- a/docs/user-guide/transformations/index.md
+++ b/docs/user-guide/transformations/index.md
@@ -5,4 +5,4 @@ The focus of this section is to describe different types of data transformations
- [Joins](joins.md)
- [Concatenation](concatenation.md)
- [Pivot](pivot.md)
-- [Melt](melt.md)
\ No newline at end of file
+- [Melt](melt.md)
From ac2ba43677b75e29467849d6a3ab257cf20e9762 Mon Sep 17 00:00:00 2001
From: r-brink
Date: Tue, 23 Jan 2024 10:47:01 +0100
Subject: [PATCH 11/12] resolved feedback and formatting
---
.../python/user-guide/basics/reading-writing.py | 2 +-
docs/user-guide/getting-started.md | 16 +++++++---------
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/docs/src/python/user-guide/basics/reading-writing.py b/docs/src/python/user-guide/basics/reading-writing.py
index f01fbba3fb30..68c0ab235fd1 100644
--- a/docs/src/python/user-guide/basics/reading-writing.py
+++ b/docs/src/python/user-guide/basics/reading-writing.py
@@ -11,7 +11,7 @@
datetime(2025, 1, 3),
],
"float": [4.0, 5.0, 6.0],
- "string": ["a", "b", "c"]
+ "string": ["a", "b", "c"],
}
)
diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md
index 07fa84e3a479..3ae743114cf8 100644
--- a/docs/user-guide/getting-started.md
+++ b/docs/user-guide/getting-started.md
@@ -22,7 +22,7 @@ This chapter is here to help you get started with Polars. It covers all the fund
## Reading & writing
-Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we use csv as example to demonstrate foundational read/write operations.
+Polars supports reading and writing for common file formats (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we show the concept of reading and writing to disk.
{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}
@@ -30,9 +30,7 @@ Polars supports reading and writing to all common files (e.g. csv, json, parquet
--8<-- "python/user-guide/basics/reading-writing.py:dataframe"
```
-### CSV example
-
-In this example we write the DataFrame to `output.csv`. After that we can read it back with `read_csv` and `print` the result for inspection.
+In the example below we write the DataFrame to a csv file called `output.csv`. After thatread it back with `read_csv` and `print` the result for inspection.
{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}
@@ -40,11 +38,11 @@ In this example we write the DataFrame to `output.csv`. After that we can read i
--8<-- "python/user-guide/basics/reading-writing.py:csv"
```
-For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide.
+For more examples on the CSV file format and other data formats, start with the [IO section](io/index.md) of the User Guide.
## Expressions
-`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
+`Expressions` are the core strength of Polars. The `expressions` offer a modular structure that allows you to combine simple concepts into complex queries. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
- `select`
- `filter`
@@ -53,7 +51,7 @@ For more examples on the CSV file format and other data formats, start here [IO
To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](concepts/contexts.md) and [Expressions](concepts/expressions.md).
-### Select statement
+### Select
To select a column we need to do two things:
@@ -105,7 +103,7 @@ print(
)
```
-### With_columns
+### Add columns
`with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results.
@@ -144,7 +142,7 @@ print(
)
```
-### Combining operations
+### Combination
Below are some examples on how to combine operations to create the `DataFrame` you require.
From 063de4c067d173dce61fafb0d7e9156ec56e1992 Mon Sep 17 00:00:00 2001
From: chielP
Date: Tue, 23 Jan 2024 14:43:03 +0100
Subject: [PATCH 12/12] nested list
---
docs/user-guide/concepts/index.md | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md
index e3eac9b2c70f..63a2ebeabe44 100644
--- a/docs/user-guide/concepts/index.md
+++ b/docs/user-guide/concepts/index.md
@@ -2,9 +2,8 @@
The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics:
-- Data types:
- - [Overview](data-types/overview.md)
- - [Categoricals](data-types/categoricals.md)
+- [Data Types: Overview](data-types/overview.md)
+- [Data Types: Categoricals](data-types/categoricals.md)
- [Data structures](data-structures.md)
- [Contexts](contexts.md)
- [Expressions](expressions.md)