Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Change dprint config #19747

Merged
merged 2 commits into from
Nov 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 29 additions & 37 deletions .github/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,15 @@

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to make participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
In the interest of fostering an open and welcoming environment, we as contributors and maintainers
pledge to make participation in our project and our community a harassment-free experience for
everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity
and expression, level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:
Examples of behavior that contributes to creating a positive environment include:

- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
Expand All @@ -22,53 +20,47 @@ include:

Examples of unacceptable behavior by participants include:

- The use of sexualized language or imagery and unwelcome sexual attention or
advances
- The use of sexualized language or imagery and unwelcome sexual attention or advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or electronic
address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting
- Publishing others' private information, such as a physical or electronic address, without explicit
permission
- Other conduct which could reasonably be considered inappropriate in a professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers are responsible for clarifying the standards of acceptable behavior and are
expected to take appropriate and fair corrective action in response to any instances of unacceptable
behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits,
code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or
to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.
This Code of Conduct applies within all project spaces, and it also applies when an individual is
representing the project or its community in public spaces. Examples of representing a project or
community include using an official project e-mail address, posting via an official social media
account, or acting as an appointed representative at an online or offline event. Representation of a
project may be further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at ritchie46@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting
the project team at ritchie46@gmail.com. All complaints will be reviewed and investigated and will
result in a response that is deemed necessary and appropriate to the circumstances. The project team
is obligated to maintain confidentiality with regard to the reporter of an incident. Further details
of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face
temporary or permanent repercussions as determined by other members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

Expand Down
6 changes: 4 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Contributing to Polars

Thanks for taking the time to contribute! We appreciate all contributions, from reporting bugs to implementing new features.
Thanks for taking the time to contribute! We appreciate all contributions, from reporting bugs to
implementing new features.

Please refer to the [contributing section](https://docs.pola.rs/development/contributing/) of our documentation to get started.
Please refer to the [contributing section](https://docs.pola.rs/development/contributing/) of our
documentation to get started.

We look forward to your contributions!
64 changes: 37 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@
## Polars: Blazingly fast DataFrames in Rust, Python, Node.js, R, and SQL

Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust using
[Apache Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html) as the memory model.
[Apache Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html) as the memory
model.

- Lazy | eager execution
- Multi-threaded
Expand Down Expand Up @@ -158,22 +159,25 @@ Refer to the [Polars CLI repository](https://github.com/pola-rs/polars-cli) for

### Blazingly fast

Polars is very fast. In fact, it is one of the best performing solutions available. See the [PDS-H benchmarks](https://www.pola.rs/benchmarks.html) results.
Polars is very fast. In fact, it is one of the best performing solutions available. See the
[PDS-H benchmarks](https://www.pola.rs/benchmarks.html) results.

### Lightweight

Polars is also very lightweight. It comes with zero required dependencies, and this shows in the import times:
Polars is also very lightweight. It comes with zero required dependencies, and this shows in the
import times:

- polars: 70ms
- numpy: 104ms
- pandas: 520ms

### Handles larger-than-RAM data

If you have data that does not fit into memory, Polars' query engine is able to process your query (or parts of your query) in a streaming fashion.
This drastically reduces memory requirements, so you might be able to process your 250GB dataset on your laptop.
Collect with `collect(streaming=True)` to run the query streaming.
(This might be a little slower, but it is still very fast!)
If you have data that does not fit into memory, Polars' query engine is able to process your query
(or parts of your query) in a streaming fashion. This drastically reduces memory requirements, so
you might be able to process your 250GB dataset on your laptop. Collect with
`collect(streaming=True)` to run the query streaming. (This might be a little slower, but it is
still very fast!)

## Setup

Expand All @@ -185,7 +189,8 @@ Install the latest Polars version with:
pip install polars
```

We also have a conda package (`conda install -c conda-forge polars`), however pip is the preferred way to install Polars.
We also have a conda package (`conda install -c conda-forge polars`), however pip is the preferred
way to install Polars.

Install Polars with all optional dependencies.

Expand All @@ -199,20 +204,22 @@ You can also install a subset of all optional dependencies.
pip install 'polars[numpy,pandas,pyarrow]'
```

See the [User Guide](https://docs.pola.rs/user-guide/installation/#feature-flags) for more details on optional dependencies
See the [User Guide](https://docs.pola.rs/user-guide/installation/#feature-flags) for more details
on optional dependencies

To see the current Polars version and a full list of its optional dependencies, run:

```python
pl.show_versions()
```

Releases happen quite often (weekly / every few days) at the moment, so updating Polars regularly to get the latest bugfixes / features might not be a bad idea.
Releases happen quite often (weekly / every few days) at the moment, so updating Polars regularly to
get the latest bugfixes / features might not be a bad idea.

### Rust

You can take latest release from `crates.io`, or if you want to use the latest features / performance
improvements point to the `main` branch of this repo.
You can take latest release from `crates.io`, or if you want to use the latest features /
performance improvements point to the `main` branch of this repo.

```toml
polars = { git = "https://github.com/pola-rs/polars", rev = "<optional git tag>" }
Expand All @@ -234,36 +241,39 @@ This can be done by going through the following steps in sequence:
2. Install [maturin](https://maturin.rs/): `pip install maturin`
3. `cd py-polars` and choose one of the following:
- `make build`, slow binary with debug assertions and symbols, fast compile times
- `make build-release`, fast binary without debug assertions, minimal debug symbols, long compile times
- `make build-nodebug-release`, same as build-release but without any debug symbols, slightly faster to compile
- `make build-debug-release`, same as build-release but with full debug symbols, slightly slower to compile
- `make build-release`, fast binary without debug assertions, minimal debug symbols, long compile
times
- `make build-nodebug-release`, same as build-release but without any debug symbols, slightly
faster to compile
- `make build-debug-release`, same as build-release but with full debug symbols, slightly slower
to compile
- `make build-dist-release`, fastest binary, extreme compile times

By default the binary is compiled with optimizations turned on for a modern CPU. Specify `LTS_CPU=1`
with the command if your CPU is older and does not support e.g. AVX2.

Note that the Rust crate implementing the Python bindings is called `py-polars` to distinguish from the wrapped
Rust crate `polars` itself. However, both the Python package and the Python module are named `polars`, so you
can `pip install polars` and `import polars`.
Note that the Rust crate implementing the Python bindings is called `py-polars` to distinguish from
the wrapped Rust crate `polars` itself. However, both the Python package and the Python module are
named `polars`, so you can `pip install polars` and `import polars`.

## Using custom Rust functions in Python

Extending Polars with UDFs compiled in Rust is easy. We expose PyO3 extensions for `DataFrame` and `Series`
data structures. See more in https://github.com/pola-rs/pyo3-polars.
Extending Polars with UDFs compiled in Rust is easy. We expose PyO3 extensions for `DataFrame` and
`Series` data structures. See more in https://github.com/pola-rs/pyo3-polars.

## Going big...

Do you expect more than 2^32 (~4.2 billion) rows? Compile Polars with the `bigidx` feature
flag or, for Python users, install `pip install polars-u64-idx`.
Do you expect more than 2^32 (~4.2 billion) rows? Compile Polars with the `bigidx` feature flag or,
for Python users, install `pip install polars-u64-idx`.

Don't use this unless you hit the row boundary as the default build of Polars is faster and consumes less memory.
Don't use this unless you hit the row boundary as the default build of Polars is faster and consumes
less memory.

## Legacy

Do you want Polars to run on an old CPU (e.g. dating from before 2011), or on an `x86-64` build
of Python on Apple Silicon under Rosetta? Install `pip install polars-lts-cpu`. This version of
Polars is compiled without [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) target
features.
Do you want Polars to run on an old CPU (e.g. dating from before 2011), or on an `x86-64` build of
Python on Apple Silicon under Rosetta? Install `pip install polars-lts-cpu`. This version of Polars
is compiled without [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) target features.

## Sponsors

Expand Down
23 changes: 14 additions & 9 deletions crates/polars-arrow/src/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,27 @@
# Crate's design

This document describes the design of this module, and thus the overall crate.
Each module MAY have its own design document, that concerns specifics of that module, and if yes,
it MUST be on each module's `README.md`.
This document describes the design of this module, and thus the overall crate. Each module MAY have
its own design document, that concerns specifics of that module, and if yes, it MUST be on each
module's `README.md`.

## Equality

Array equality is not defined in the Arrow specification. This crate follows the intent of the specification, but there is no guarantee that this no verification that this equals e.g. C++'s definition.
Array equality is not defined in the Arrow specification. This crate follows the intent of the
specification, but there is no guarantee that this no verification that this equals e.g. C++'s
definition.

There is a single source of truth about whether two arrays are equal, and that is via their
equality operators, defined on the module [`array/equal`](array/equal/mod.rs).
There is a single source of truth about whether two arrays are equal, and that is via their equality
operators, defined on the module [`array/equal`](array/equal/mod.rs).

Implementation MUST use these operators for asserting equality, so that all testing follows the same definition of array equality.
Implementation MUST use these operators for asserting equality, so that all testing follows the same
definition of array equality.

## Error handling

- Errors from an external dependency MUST be encapsulated on `External`.
- Errors from IO MUST be encapsulated on `Io`.
- This crate MAY return `NotYetImplemented` when the functionality does not exist, or it MAY panic with `unimplemented!`.
- This crate MAY return `NotYetImplemented` when the functionality does not exist, or it MAY panic
with `unimplemented!`.

## Logical and physical types

Expand All @@ -29,4 +33,5 @@ There is a strict separation between physical and logical types:

## Source of undefined behavior

There is one, and only one, acceptable source of undefined behavior: FFI. It is impossible to prove that data passed via pointers are safe for consumption (only a promise from the specification).
There is one, and only one, acceptable source of undefined behavior: FFI. It is impossible to prove
that data passed via pointers are safe for consumption (only a promise from the specification).
41 changes: 26 additions & 15 deletions crates/polars-arrow/src/array/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ This document describes the overall design of this module.

## Arrays:

- Every arrow array with a different physical representation MUST be implemented as a struct or generic struct.
- Every arrow array with a different physical representation MUST be implemented as a struct or
generic struct.

- An array MAY have its own module. E.g. `primitive/mod.rs`

Expand All @@ -22,16 +23,19 @@ This document describes the overall design of this module.

- Every child array on the struct MUST be `Box<dyn Array>`.

- An array MUST implement `try_new(...) -> Self`. This method MUST error iff
the data does not follow the arrow specification, including any sentinel types such as utf8.
- An array MUST implement `try_new(...) -> Self`. This method MUST error iff the data does not
follow the arrow specification, including any sentinel types such as utf8.

- An array MAY implement `unsafe try_new_unchecked` that skips validation steps that are `O(N)`.

- An array MUST implement either `new_empty()` or `new_empty(DataType)` that returns a zero-len of `Self`.
- An array MUST implement either `new_empty()` or `new_empty(DataType)` that returns a zero-len of
`Self`.

- An array MUST implement either `new_null(length: usize)` or `new_null(DataType, length: usize)` that returns a valid array of length `length` whose all elements are null.
- An array MUST implement either `new_null(length: usize)` or `new_null(DataType, length: usize)`
that returns a valid array of length `length` whose all elements are null.

- An array MAY implement `value(i: usize)` that returns the value at slot `i` ignoring the validity bitmap.
- An array MAY implement `value(i: usize)` that returns the value at slot `i` ignoring the validity
bitmap.

- functions to create new arrays from native Rust SHOULD be named as follows:
- `from`: from a slice of optional values (e.g. `AsRef<[Option<bool>]` for `BooleanArray`)
Expand All @@ -42,20 +46,26 @@ This document describes the overall design of this module.

### Slot offsets

- An array MUST have a `offset: usize` measuring the number of slots that the array is currently offsetted by if the specification requires.
- An array MUST have a `offset: usize` measuring the number of slots that the array is currently
offsetted by if the specification requires.

- An array MUST implement `fn slice(&self, offset: usize, length: usize) -> Self` that returns an offsetted and/or truncated clone of the array. This function MUST increase the array's offset if it exists.
- An array MUST implement `fn slice(&self, offset: usize, length: usize) -> Self` that returns an
offsetted and/or truncated clone of the array. This function MUST increase the array's offset if
it exists.

- Conversely, `offset` MUST only be changed by `slice`.

The rational of the above is that it enable us to be fully interoperable with the offset logic supported by the C data interface, while at the same time easily perform array slices
within Rust's type safety mechanism.
The rational of the above is that it enable us to be fully interoperable with the offset logic
supported by the C data interface, while at the same time easily perform array slices within Rust's
type safety mechanism.

### Mutable Arrays

- An array MAY have a mutable counterpart. E.g. `MutablePrimitiveArray<T>` is the mutable counterpart of `PrimitiveArray<T>`.
- An array MAY have a mutable counterpart. E.g. `MutablePrimitiveArray<T>` is the mutable
counterpart of `PrimitiveArray<T>`.

- Arrays with mutable counterparts MUST have its own module, and have the mutable counterpart declared in `{module}/mutable.rs`.
- Arrays with mutable counterparts MUST have its own module, and have the mutable counterpart
declared in `{module}/mutable.rs`.

- The trait `MutableArray` MUST only be implemented by mutable arrays in this module.

Expand All @@ -67,7 +77,8 @@ within Rust's type safety mechanism.
- it must not allocate
- it must not cause `O(N)` data transformations

This is achieved by converting mutable versions to immutable counterparts (e.g. `MutableBitmap -> Bitmap`).
This is achieved by converting mutable versions to immutable counterparts (e.g.
`MutableBitmap -> Bitmap`).

The rational is that `MutableArray`s can be used to perform in-place operations under
the arrow spec.
The rational is that `MutableArray`s can be used to perform in-place operations under the arrow
spec.
Loading