RFC: add support for batched matmul / einsum / batched tensordot #727

arogozhnikov · 2024-01-03T23:57:57Z

Currently I don't see a way to implement einsum or batched matmul using the standard. There is a way with explicit loop, but that does not count.

Some ways this can be addressed:

minimal implementation would include batched_matmul (signature bij, bjk -> bik). Other operations can be reduced to it with reshapes. This may include additional copies
modification of tensordot to include batched dimensions thus should be more efficient, but AFAIK this method is not surfaced in any major framework
include einsum in the standard

The text was updated successfully, but these errors were encountered:

leofang · 2024-01-04T01:22:10Z

matmul already supports batching in the spec. In fact, batching is the number one requirement for linalg routines, so wherever it's applicable we allow it.

Batched tensordot is indeed like you said not implemented in any framework, most likely because there's matmul and einsum in the existing frameworks. Semantically, it's not straightforward to allow batching while keeping the same API style (I think it likely needs an extra kw-only argument to specify the batch dimensions).

I was tasked to propose einsum, as I had strong opinons after the involvement in implementing cuQuantum's counterpart, but unfortunately I lacked of time. I have a very radical version in mind (some NumPy behavior is simply bad IMHO) that has not been implemented in all but a few (cuQuanutm & opt_einsum) libraries. Feel free to share your thoughts or even draft a PR, we'll get it discussed 🙂

arogozhnikov · 2024-01-06T21:50:58Z

matmul already supports batching in the spec. In fact, batching is the number one requirement for linalg routines, so wherever it's applicable we allow it.

Great, I've missed that

(I think it likely needs an extra kw-only argument to specify the batch dimensions).

yup.

Feel free to share your thoughts

All I ever needed is ellipsis and explicit notation (and I like using spaces anywhere, e.g. 'b ... c, c d -> b ... d ', but that's just a taste).
Any standard that requires this minimum done is already perfect for me.

I don't use explicit path specification as framework should be in a better position to guess contraction path (that's not just number of operations - but also how memory-friendly operations become, and how much memory available).
Should I need to specify a contraction path I'd prefer a number of einsums instead.

Implicit notation is of questionable value to me - I want to see result, not infer it from pattern.
As examples: why ij,jk and nl,lm are two different operations? Why a,a and ...,... are not the same for 1d tensors? etcetc.

Index-based interface is useless - when searching github for usages of it, saw dozens of different CI tests but I found zero production uses.

asmeurer · 2024-01-07T08:46:08Z

Index-based interface is useless - when searching github for usages of it, saw dozens of different CI tests but I found zero production uses.

The index interface is more useful for instances where you generate an einsum programmatically.

arogozhnikov · 2024-01-11T20:31:16Z

The index interface is more useful for instances where you generate an einsum programmatically.

Yes, I guess that was in thinking when designed, but this interleaving of indices and tensors is so awkward - it is not friendly to coding (and specially to type hinting).

asmeurer · 2024-01-12T00:31:41Z

Perhaps the syntax could be improved. In principle an explicit form should be much better for type hinting than a string that is only parsed at runtime.

arogozhnikov · 2024-01-12T01:28:00Z

In principle an explicit form should be much better for type hinting than a string that is only parsed at runtime.

Why? For type hinting signature einsum(str, *tensor) -> tensor is good enough.

To actually check an operation all of them aren't good: einsum(misplaced_x, x, (1, 2, 4), y, (1, 2, 5, 6, -1), (0, 3, 100, ...)).
In strings that's apparent that something is off, but not in indices.

There is some expectation that having tuples makes code more statically checkable, but it is not the case.
Strings are more human-checkable.

Should there be index interface for einsum (should?), I'd prefer them being separated in API (einops_str, einops_indices or alike).

leofang · 2024-01-12T15:05:53Z

I am very eager to respond in length, but unfortunately I am a bit swamped this week. Nevertheless, the discussion so far is very encouraging, as I was thinking

I have a very radical version in mind (some NumPy behavior is simply bad IMHO)

and I am very happy to know I am not alone

The index interface is more useful for instances where you generate an einsum programmatically.

Implicit notation is of questionable value to me - I want to see result, not infer it from pattern.

I'd prefer them being separated in API (einops_str, einops_indices or alike).

Let me turn on my radical side and add why I dislike the einsum string form (and I promise I'll (try to) be more responsive next week). These are two expressions taken from our test suite:

'ÄivrÃEË,JÓIT,ÐOáÙà,Åx,oÒØÏmUÍ,OàW,SgÌCÈ,NgSoJPÌ,vcXz×,HÚGÞ,Æw,ÝD,GsÑi,QÛlÔqzL,IÓqhBdÜ,ÄV,KÎAÇ,uTÖDQÕ,ÆstFÝA,ÛtjÂK,pÔYMbe,FÂkÑÎÁ,rlCVÉÜwaP,ÐÖÀnxU,Þhym,nÏÙßØRÚ,HZÇßk,WÊMË,ÊfÀÍÕLZ,×ÃÈEuRp,ÅjÉáYNB,ÒÁXy->cbefad'

and

abc,def,ghi,jkl,mno,pqr,stu,vwx,yzA,BCD,EFG,HIJ,KLM,NOP,QRS,TUV,WXY,ZÀÁ,ÂÃÄ,ÅÆÇ,ÈÉÊ,ËÌÍ,ÎÏÐ,ÑÒÓ,ÔÕÖ,×ØÙ,ÚÛÜ,ÝÞß,àáâ,ãäå,æçè,éêë,cfìí,íîï,ilîð,ðñò,orñó,óôõ,uxôö,ö÷ø,AD÷ù,ùúû,GJúü,üýþ,MPýÿ,ÿĀā,SVĀĂ,ĂăĄ,YÁăą,ąĆć,ÄÇĆĈ,ĈĉĊ,ÊÍĉċ,ċČč,ÐÓČĎ,ĎďĐ,ÖÙďđ,đĒē,ÜßĒĔ,ĔĕĖ,âåĕė,ėĘę,èëĘĚ,Ěìě,ïòĜĝ,ĝĞğ,õøĞĠ,ĠġĢ,ûþġģ,ģĤĥ,āĄĤĦ,ĦħĨ,ćĊħĩ,ĩĪī,čĐĪĬ,ĬĭĮ,ēĖĭį,įİı,ęěİĲ,ĲĜĳ,ğĢĴĵ,ĵĶķ,ĥĨĶĸ,ĸĹĺ,īĮĹĻ,ĻļĽ,ıĳļľ,ľĴĿ,ķĺŀŁ,ŁłŃ,ĽĿłń,ŅņŇ,ňŉŊ,ŋŌō,ŎŏŐ,őŒœ,ŔŕŖ,ŗŘř,ŚśŜ,ŝŞş,ŠšŢ,ţŤť,ŦŧŨ,ũŪū,ŬŭŮ,ůŰű,ŲųŴ,ŵŶŷ,ŸŹź,ŻżŽ,žſƀ,ƁƂƃ,ƄƅƆ,ƇƈƉ,ƊƋƌ,ƍƎƏ,ƐƑƒ,ƓƔƕ,ƖƗƘ,ƙƚƛ,ƜƝƞ,ƟƠơ,ƢƣƤ,ŇŊƥƦ,ƦƧƨ,ōŐƧƩ,Ʃƪƫ,œŖƪƬ,ƬƭƮ,řŜƭƯ,ƯưƱ,şŢưƲ,ƲƳƴ,ťŨƳƵ,ƵƶƷ,ūŮƶƸ,Ƹƹƺ,űŴƹƻ,ƻƼƽ,ŷźƼƾ,ƾƿǀ,Žƀƿǁ,ǁǂǃ,ƃƆǂǄ,Ǆǅǆ,ƉƌǅǇ,Ǉǈǉ,ƏƒǈǊ,Ǌǋǌ,ƕƘǋǍ,ǍǎǏ,ƛƞǎǐ,ǐǑǒ,ơƤǑǓ,Ǔƥǔ,ƨƫǕǖ,ǖǗǘ,ƮƱǗǙ,ǙǚǛ,ƴƷǚǜ,ǜǝǞ,ƺƽǝǟ,ǟǠǡ,ǀǃǠǢ,ǢǣǤ,ǆǉǣǥ,ǥǦǧ,ǌǏǦǨ,ǨǩǪ,ǒǔǩǫ,ǫǕǬ,ǘǛǭǮ,Ǯǯǰ,ǞǡǯǱ,Ǳǲǳ,ǤǧǲǴ,ǴǵǶ,ǪǬǵǷ,ǷǭǸ,ǰǳǹǺ,ǺǻǼ,ǶǸǻǽ,ǽǹǼ,ńŀŃ,çéƠƢ,äæƝƟ,áãƚƜ,ÞàƗƙ,ÛÝƔƖ,ØÚƑƓ,Õ×ƎƐ,ÒÔƋƍ,ÏÑƈƊ,ÌÎƅƇ,ÉËƂƄ,ÆÈſƁ,ÃÅżž,ÀÂŹŻ,XZŶŸ,UWųŵ,RTŰŲ,OQŭů,LNŪŬ,IKŧũ,FHŤŦ,CEšţ,zBŞŠ,wyśŝ,tvŘŚ,qsŕŗ,npŒŔ,kmŏő,hjŌŎ,egŉŋ,bdņň,êaƣŅ->

For large tensor networks, it is really an unpleasant interface to work with, we've see expressions that can take a whole slide. There're not many unicode characters that can be used, NumPy/CuPy do not support unicode input, and parsing/processing this kind of long expressions is very tedious. When interfacing a C library like cuTensorNet (or even NumPy internal), the index form is a much better choice.

arogozhnikov · 2024-01-12T18:51:16Z

happy to know I am not alone

😉

We're talking about very different scenarios - I'm trying to cover common use-case: 2 or at most 4 tensors, usually a direct interface for applied scientist.

For large tensor networks, it is really an unpleasant interface to work with

(your argument isn't good - long array of tuples isn't shorter or more readable, API should not target CI as a main scenario).
Regardless, I agree if you generate TNs from some other abstraction - strings are not an appropriate interface (unless there is name-based interface for user - in this case sense to propagate names to simplify error handling logic).

I assume you want to model q entanglement, so I see why you want many indices.
Side remark: einops allows multi-char names (and names can have digits).

Also question to you: aren't you constrained by "single ellipsis"? Do you use ellipsis at all?

leofang added this to the v2024 milestone Feb 13, 2024

kgryte changed the title ~~Batched matmul / einsum / batched tensordot~~ RFC: add support for batched matmul / einsum / batched tensordot Apr 4, 2024

kgryte added RFC Request for comments. Feature requests and proposed changes. topic: Linear Algebra Linear algebra. Needs Discussion Needs further discussion. labels Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add support for batched matmul / einsum / batched tensordot #727

RFC: add support for batched matmul / einsum / batched tensordot #727

arogozhnikov commented Jan 3, 2024

leofang commented Jan 4, 2024

arogozhnikov commented Jan 6, 2024 •

edited

asmeurer commented Jan 7, 2024

arogozhnikov commented Jan 11, 2024

asmeurer commented Jan 12, 2024

arogozhnikov commented Jan 12, 2024

leofang commented Jan 12, 2024

arogozhnikov commented Jan 12, 2024 •

edited

RFC: add support for batched matmul / einsum / batched tensordot #727

RFC: add support for batched matmul / einsum / batched tensordot #727

Comments

arogozhnikov commented Jan 3, 2024

leofang commented Jan 4, 2024

arogozhnikov commented Jan 6, 2024 • edited

asmeurer commented Jan 7, 2024

arogozhnikov commented Jan 11, 2024

asmeurer commented Jan 12, 2024

arogozhnikov commented Jan 12, 2024

leofang commented Jan 12, 2024

arogozhnikov commented Jan 12, 2024 • edited

arogozhnikov commented Jan 6, 2024 •

edited

arogozhnikov commented Jan 12, 2024 •

edited