Adding yelp_clog S3LogsReader #949

yaroliakh · 2024-04-11T16:59:30Z

Enabling S3LogsReader from yelp_clog as an alternative method for reading tron Paasta logs.

tron/utils/scribereader.py

nemacysts · 2024-04-11T17:14:13Z

tron/utils/scribereader.py

-        log.warning("Unable to read location mapping files from disk, not returning scribereader host/port")
-        return None
-
+def get_scribereader_host_and_port(ecosystem, superregion, region: str) -> Optional[Tuple[str, int]]:
    # NOTE: Passing in an ecosystem of prod is not supported by scribereader


just checking: is this comment still valid for the new system?

still valid, as in case of using scirbereader tailler it needs to get scribe service host and port, which will raise KeyError when called with "prod" ecosystem argument.

nemacysts · 2024-04-11T18:01:15Z

tron/utils/scribereader.py

+    # for logs that use Kafka topics with multiple partitions underneath or retrieved by S3LogsReader,
+    # data ordering is not guarantied - so we'll sort based on log timestamp set by producer.
+    lines = paasta_logs.sort_log_lines()


one day we won't need to sort things :P

that said: does this mean that scribereader/paasta logs/etc will also need to sort?

yes, scribereader still doesn't provide sorting as both stream_reader and stream_tailer can return unordered logs when the underlying Kafka topic has more than one partition. paasta logs does also sorting, and it's actually somehow even more similar now with PaaSTALogs class that will be used to fetch and filters logs.
Thus sorting is still needed on the client side (tron, paasta, yelp_clog, etc) until we figure out something on the backend.

ah gotcha - i always assumed that paasta logs was just dumping what scribereader returned - TIL :p

nemacysts · 2024-04-15T15:14:45Z

tron/utils/scribereader.py

+
+        log.debug("Using S3LogsReader to retrieve logs")
+        s3_reader = S3LogsReader(ecosystem).get_log_reader(log_name=stream_name, min_date=start_date, max_date=end_date)
+        paasta_logs.iterate_logs(s3_reader, max_lines)


i wonder if we should rename iterate_logs as we're really fetching them - not returning an iterator

also: thoughts on having this function return the log line? atm, it reads a bit weirdly since you'd expect a return value - but we're actually having the class store the log lines instead

agree, it's not ideal but the primary reason for using class store is to avoid repeated sorting of lines when iterator is called multiple times for different readers, as in case of scribereader reader & tailler. With class store we can call sort once after all the logs are fetched. Even though with the current max_lines pretty low this is not an issue at all, I was wondering if we should increase it to reduce not so rare occurrences of This output is truncated for certain jobs?

increasing max_lines should be fine - it's more of a protection against jobs that spew gigabytes of logs and run multiple times :)

i'm hoping that at some point in the future we try to re-architect things a bit here so that we can store these logs in a more efficient way for tron to query them, but that's for future us!

we're also currently only sorting once in the pre-PR code though, right?

right, nothing changes in this regard. With PaastaLogs just try to avoid extra code duplication with addition of the the third reader. I think we can simplify it once again with removal of scribe readers

so to rephrase:

one point for using class store is to sort only once when all the logs have been fetched

secondly, to mask the details of paasta logs sorting inside the class itself (as we should care only about lines), rather than returning output with a list of tuples and doing sorting and extracting of lines outside it

tron/utils/scribereader.py

nemacysts · 2024-04-15T15:21:07Z

tron/utils/scribereader.py

+        s3_reader = S3LogsReader(ecosystem).get_log_reader(log_name=stream_name, min_date=start_date, max_date=end_date)
+        paasta_logs.iterate_logs(s3_reader, max_lines)
+    else:
+        end_date = max_date.date() if max_date else None  # type: ignore


just curious: why are we telling mypy to ignore this line?

updated end_date, but before it was complaining something about type mismatch

tron/utils/scribereader.py

nemacysts · 2024-04-15T15:24:43Z

requirements.txt

just double-checking: does check-requirements pass locally?

because of our hacky requirements setup here, it might also be worth adding the extra yelp reqs to requirements/requirements-minimal to make sure everything is pinned :)

hm, check-requirements works locally.

adding the extra yelp reqs to requirements/requirements-minima

wouldn't it break github CI,as yelp internal packages won't be available?

ah, sorry - i was thinking faster than i was typing: it might also be worth temporarily adding the extra yelp reqs to requirements/requirements-minimal to make sure everything is pinned :) - temporarily adding them would let you run check-requirements and make sure everything is pinned correctly :)

I see now. yes, there are some errors, i'll need to update it

these have been updated in yelp_package/extra_requirements_yelp.txt

nemacysts · 2024-04-16T17:03:53Z

(also tagging some folks working on tron stuff this quarter to make sure they're aware of the logging changes)

nemacysts

lgtm - i can ship once you figure out the local test issues

(i'm not super enamoured with storing the logs in the class, but my objections here are mostly 'cause it doesn't read naturally to my eyes - which isn't really a reason to block :p)

yaroliakh · 2024-04-18T12:15:48Z

@nemacysts I was struggling a bit with mypy import errors, so decided to update mypy as it seems to support smart ignore conditions in code, but eventually ended up just adding a global ignore rules into mypy.ini.
Hopefully having new mypy version won't be an issue, otherwise I can revert it back if needed

nemacysts · 2024-04-26T18:04:13Z

requirements-dev-minimal.txt

@@ -7,3 +7,6 @@ pylint
 pytest
 pytest-asyncio
 requirements-tools
+types-PyYAML
+types-requests<2.31.0.7


just curious, why the upper-bound?

nemacysts · 2024-04-26T18:07:45Z

tests/utils/scribereader_test.py

+    pytest.skip("yelp logs readers not available, skipping tests", allow_module_level=True)
+
+
+def static_conf_patch(args={}):


generally, mutable default arguments are something to be avoided - i think maybe something like:

def static_conf_patch(args=None): args = args if args is not None else {} return lambda arg, namespace, default=None: args.get(arg)

(might also be worth a comment here since i'm not 100% sure why this is necessary atm)

Updated, thanks. Primarily adding this because without explicit patching of specific staticconf values and instead using return_value= in unittest.mock.patch it causes side effects when staticconf.read is called more than once, as in case with addition of new reader functionality.

nemacysts · 2024-04-26T19:41:22Z

tests/utils/scribereader_test.py

+    tron.utils.scribereader.scribereader_available = False
+    tron.utils.scribereader.s3reader_available = False


i think setting these with a mock might be a tad cleaner since that would take care of setting things back without needing a finally block

yaroliakh added 5 commits April 11, 2024 08:50

Upgrade yelp scribereader deps

55a2d4b

Enable S3LogsReader for action run logs

912dd9a

Fixing linters complains

dc0861e

Upgrade boto requirements

e96893a

Workaround mypy checks

d1d6d07

yaroliakh changed the title ~~Yaro/yelp clog s3reader~~ Adding yelp_clog S3LogsReader Apr 12, 2024

nemacysts reviewed Apr 15, 2024

View reviewed changes

Update

819f0ce

yaroliakh requested a review from nemacysts April 16, 2024 13:01

nemacysts requested review from EmanElsaban, KaspariK, jfongatyelp and cuza April 16, 2024 17:03

nemacysts reviewed Apr 16, 2024

View reviewed changes

yaroliakh added 2 commits April 18, 2024 04:10

Upgrade mypy and update type ignores

f6395d7

Update type ignore

e89f71b

yaroliakh force-pushed the yaro/yelp_clog_s3reader branch from 63ab755 to e89f71b Compare April 18, 2024 12:08

Pin more internal requirements

7678b38

yaroliakh requested a review from nemacysts April 19, 2024 08:05

nemacysts approved these changes Apr 26, 2024

View reviewed changes

Small updates

fa66beb

yaroliakh merged commit 82419de into master Apr 29, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding yelp_clog S3LogsReader #949

Adding yelp_clog S3LogsReader #949

yaroliakh commented Apr 11, 2024 •

edited

Loading

nemacysts Apr 11, 2024

yaroliakh Apr 16, 2024

nemacysts Apr 11, 2024

yaroliakh Apr 16, 2024

nemacysts Apr 16, 2024

nemacysts Apr 15, 2024

nemacysts Apr 15, 2024

yaroliakh Apr 16, 2024 •

edited

Loading

nemacysts Apr 16, 2024

nemacysts Apr 16, 2024

yaroliakh Apr 16, 2024 •

edited

Loading

yaroliakh Apr 16, 2024

nemacysts Apr 15, 2024

yaroliakh Apr 16, 2024

nemacysts Apr 15, 2024

yaroliakh Apr 16, 2024

nemacysts Apr 16, 2024

yaroliakh Apr 16, 2024

yaroliakh Apr 18, 2024

nemacysts commented Apr 16, 2024

nemacysts left a comment

yaroliakh commented Apr 18, 2024

nemacysts Apr 26, 2024

nemacysts Apr 26, 2024

nemacysts Apr 26, 2024

yaroliakh Apr 29, 2024

nemacysts Apr 26, 2024

		pytest.skip("yelp logs readers not available, skipping tests", allow_module_level=True)


		def static_conf_patch(args={}):

		tron.utils.scribereader.scribereader_available = False
		tron.utils.scribereader.s3reader_available = False

Adding yelp_clog S3LogsReader #949

Adding yelp_clog S3LogsReader #949

Conversation

yaroliakh commented Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaroliakh Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaroliakh Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nemacysts commented Apr 16, 2024

nemacysts left a comment

Choose a reason for hiding this comment

yaroliakh commented Apr 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaroliakh commented Apr 11, 2024 •

edited

Loading

yaroliakh Apr 16, 2024 •

edited

Loading

yaroliakh Apr 16, 2024 •

edited

Loading