Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryGen - Duplicates #51

Open
ljukas opened this issue Aug 23, 2019 · 1 comment
Open

QueryGen - Duplicates #51

ljukas opened this issue Aug 23, 2019 · 1 comment
Labels
wontfix at the moment This will not be worked on at the moment

Comments

@ljukas
Copy link
Collaborator

ljukas commented Aug 23, 2019

Hello.

When we met at the mid-term review we talked about the test methodology. One thing we mentioned was that we wanted to run throughput tests with the different query-templates.

For some query-templates only a few different queries can be generated that are distinctly different, for example query-template 2 will only generate 22 different queries when the database is created with the regular settings.

One way to combat this was for the querygen to be able to generate duplicates. I'm ready, or very very soon ready, to start running the real tests now. Would it be possible to include an option in the querygen that makes it generate duplicates? Or should I just copy-paste the generated queries to get more of them?

@hartig
Copy link
Member

hartig commented Aug 23, 2019

I think in the particular case of your experiments it would be better to have the test driver reuse queries once it runs out of available queries (i.e., essentially, starting from the beginning again). This approach is easier to control and it allows for achieving deterministic experiment runs, which is not the case if the query generator simply creates duplicates in a random fashion. Of course, instead of implementing this approach into your test driver, you may also simply copy-paste the generated queries to achieve the same result. In any case, you need to think a bit about how you want to adopt this approach for multi-client experiments. Please outline a strategy in an email to us and we can discuss the strategy further.

Having said that, perhaps there are use cases in which we actually want workloads with duplicates. However, this requires a more systematic approach than just randomly generating queries without caring about duplicates. Instead, it should be possible to control the fraction of duplicates within the generated workloads. While developing (and implementing) such a more systematic approach is not a priority at the moment, we can leave this issue open as a reminder that we may come back to it later if needed.

@hartig hartig added the wontfix at the moment This will not be worked on at the moment label Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix at the moment This will not be worked on at the moment
Projects
None yet
Development

No branches or pull requests

2 participants