Understanding¶
This page explores more in-depth ZeroRQ concepts and architecture.
Definitions¶
The abstractions of a distributed taskqueue can be resumed in three main components.
Note that, unlike many task queues, a Job is distingushed from a Task in ZeroRQ.
- Task¶
- A
Taskis a convenient concept that represents a callable.It’s the type returned when a function is decorated with@task.Adding methods likeTask.enqueuewhile preserving the awaitable signature.>>> from zrq import task >>> task(asyncio.sleep) Task('sleep')
- Queue¶
- A
Queueis a structure that containsJobs.It’s a stack (or a set) consumed in a FIFO « First In, First Out » fashion.Providing strong guarantees like atomicity, consistency and isolation.>>> from zrq import Queue >>> Queue("default").size 1337
- Job¶
- A
Jobis a snapshot of a queued task containing both arguments and options.It’s a work unit distributed to the workers that consuming the according queue.Each Job are unique and immutable. Moreover, they can be tracked on lifetime.>>> from zrq import Job >>> Job.from_id("0000000000-0000-0000-0000-000000000000").state State.PENDING
Those components can be seen everywhere, from simple to complex implementations.
>>> from zrq import task, Queue
>>> tsk = task(asyncio.sleep)
>>> job = await Queue("default").enqueue(t, 1)
>>> job.state
State.PENDING
Motivations¶
Why another taskqueue when the Python ecosystem already bring a bunch?
Concurrency is not Parallelism¶
The most popular Python taskqueues like Celery or RQ are written in a blocking and sequential fashion. That’s a good thing to cover common needs. They are well documented, tested and maintained.
When these were created, there was no official support of concurrency (aka async/await) in the Python language. In order to scale a single worker for handling multiple jobs at once, the (logical) choosen paradigm are threading and multiprocessing implementations.
Those are expensive, needing complex thread-safe structures and are limited by the Python Global Interpreter Lock (GIL) at runtime. Don’t expect to run plenties of thread without loosing a lot of CPU time waiting and switching between them.
At this time, there was a possibility to run non-blocking, concurrent code. Using gevent, an external library that patch the Python standard library using monkey-patching. It was a great solution to provide an event-loop without hassle, but also come with tricky and opaque otherside inerrant to monkey-patching or unexpected switches between greenlets.
ZRQ focus on concurrency leaning on the native Python async/await, meaning that everything is explicit, with the downside of having to use compatible - now well stocked - libraries. Its address the need for jobs that relies on anything except the CPU, which can be doing HTTP requests or processing a huge file on the filesystem.
I/O bound vs CPU-bound¶
To understand which is bounded to what, you should ask yourself:
Is my job will goes faster if the CPU were faster? CPU bound.
Is my job takes most of the time waiting for something? I/O bound.
More about Concurrency is not Parallelism by Rob Pike (Heroku - 2012).
Streams aren’t Queues¶
Unlike the vast majority of task queues, ZRQ uses streams structures in place of sets. In the Python ecosystem, we’re the first to implement it, and the only ones to date. Yay!
There are two main reasons for this:
Stream structure are fairly new (like this project), released in Redis v5.0.
There’s a common misconception that streams is all about
pub/subpattern.
In fact, this is more than that.
“A Redis stream is a data structure that acts like an append-only log but also implements several operations to overcome some of the limits of a typical append-only log. These include random access in O(1) time and complex consumption strategies, such as consumer groups.”
—From the Introduction to Redis streams page.
The consumer groups strategy are similar to the one implemented on Apache Kafka. This strategy allows to consume a stream from multiple consumers, providing the guaranty that each message is served to a different consumer. So far, it sounds like a queue!
Pending jobs¶
Deferred jobs¶
Developer experience¶
Advantages¶
When you should use ZRQ.
Drawbacks¶
When you should you something else.
Alternatives¶
ARQ¶
ZRQ |
ARQ |
|
|---|---|---|
Backend |
Redis (Streams) |
Redis (Sorted sets) |
Deferring |
Yes |
Yes |
Results |
No |
Yes |
Scheduling |
No |
Yes |
Serializer |
Partial (JSON) |
Pluggable (Pickle) |
Synchronous |
Unsupported |
Yes (ThreadPool and ProcessPool) |
Connections |
O |
Olog(N) |
ARQ, Celery, MRQ, RQ, Django background.