Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add support for batched matmul / einsum / batched tensordot #727

Open
arogozhnikov opened this issue Jan 3, 2024 · 8 comments
Open
Labels
Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Linear Algebra Linear algebra.
Milestone

Comments

@arogozhnikov
Copy link

Currently I don't see a way to implement einsum or batched matmul using the standard. There is a way with explicit loop, but that does not count.

Some ways this can be addressed:

  • minimal implementation would include batched_matmul (signature bij, bjk -> bik). Other operations can be reduced to it with reshapes. This may include additional copies
  • modification of tensordot to include batched dimensions thus should be more efficient, but AFAIK this method is not surfaced in any major framework
  • include einsum in the standard
@leofang
Copy link
Contributor

leofang commented Jan 4, 2024

matmul already supports batching in the spec. In fact, batching is the number one requirement for linalg routines, so wherever it's applicable we allow it.

Batched tensordot is indeed like you said not implemented in any framework, most likely because there's matmul and einsum in the existing frameworks. Semantically, it's not straightforward to allow batching while keeping the same API style (I think it likely needs an extra kw-only argument to specify the batch dimensions).

I was tasked to propose einsum, as I had strong opinons after the involvement in implementing cuQuantum's counterpart, but unfortunately I lacked of time. I have a very radical version in mind (some NumPy behavior is simply bad IMHO) that has not been implemented in all but a few (cuQuanutm & opt_einsum) libraries. Feel free to share your thoughts or even draft a PR, we'll get it discussed 🙂

@arogozhnikov
Copy link
Author

arogozhnikov commented Jan 6, 2024

matmul already supports batching in the spec. In fact, batching is the number one requirement for linalg routines, so wherever it's applicable we allow it.

Great, I've missed that

(I think it likely needs an extra kw-only argument to specify the batch dimensions).

yup.

Feel free to share your thoughts

All I ever needed is ellipsis and explicit notation (and I like using spaces anywhere, e.g. 'b ... c, c d -> b ... d ', but that's just a taste).
Any standard that requires this minimum done is already perfect for me.

I don't use explicit path specification as framework should be in a better position to guess contraction path (that's not just number of operations - but also how memory-friendly operations become, and how much memory available).
Should I need to specify a contraction path I'd prefer a number of einsums instead.

Implicit notation is of questionable value to me - I want to see result, not infer it from pattern.
As examples: why ij,jk and nl,lm are two different operations? Why a,a and ...,... are not the same for 1d tensors? etcetc.

Index-based interface is useless - when searching github for usages of it, saw dozens of different CI tests but I found zero production uses.

@asmeurer
Copy link
Member

asmeurer commented Jan 7, 2024

Index-based interface is useless - when searching github for usages of it, saw dozens of different CI tests but I found zero production uses.

The index interface is more useful for instances where you generate an einsum programmatically.

@arogozhnikov
Copy link
Author

The index interface is more useful for instances where you generate an einsum programmatically.

Yes, I guess that was in thinking when designed, but this interleaving of indices and tensors is so awkward - it is not friendly to coding (and specially to type hinting).

@asmeurer
Copy link
Member

Perhaps the syntax could be improved. In principle an explicit form should be much better for type hinting than a string that is only parsed at runtime.

@arogozhnikov
Copy link
Author

In principle an explicit form should be much better for type hinting than a string that is only parsed at runtime.

Why? For type hinting signature einsum(str, *tensor) -> tensor is good enough.

To actually check an operation all of them aren't good: einsum(misplaced_x, x, (1, 2, 4), y, (1, 2, 5, 6, -1), (0, 3, 100, ...)).
In strings that's apparent that something is off, but not in indices.

There is some expectation that having tuples makes code more statically checkable, but it is not the case.
Strings are more human-checkable.

Should there be index interface for einsum (should?), I'd prefer them being separated in API (einops_str, einops_indices or alike).

@leofang
Copy link
Contributor

leofang commented Jan 12, 2024

I am very eager to respond in length, but unfortunately I am a bit swamped this week. Nevertheless, the discussion so far is very encouraging, as I was thinking

I have a very radical version in mind (some NumPy behavior is simply bad IMHO)

and I am very happy to know I am not alone

The index interface is more useful for instances where you generate an einsum programmatically.

Implicit notation is of questionable value to me - I want to see result, not infer it from pattern.

I'd prefer them being separated in API (einops_str, einops_indices or alike).

Let me turn on my radical side and add why I dislike the einsum string form (and I promise I'll (try to) be more responsive next week). These are two expressions taken from our test suite:

'ÄivrÃEË,JÓIT,ÐOáÙà,Åx,oÒØÏmUÍ,OàW,SgÌCÈ,NgSoJPÌ,vcXz×,HÚGÞ,Æw,ÝD,GsÑi,QÛlÔqzL,IÓqhBdÜ,ÄV,KÎAÇ,uTÖDQÕ,ÆstFÝA,ÛtjÂK,pÔYMbe,FÂkÑÎÁ,rlCVÉÜwaP,ÐÖÀnxU,Þhym,nÏÙßØRÚ,HZÇßk,WÊMË,ÊfÀÍÕLZ,×ÃÈEuRp,ÅjÉáYNB,ÒÁXy->cbefad'

and

abc,def,ghi,jkl,mno,pqr,stu,vwx,yzA,BCD,EFG,HIJ,KLM,NOP,QRS,TUV,WXY,ZÀÁ,ÂÃÄ,ÅÆÇ,ÈÉÊ,ËÌÍ,ÎÏÐ,ÑÒÓ,ÔÕÖ,×ØÙ,ÚÛÜ,ÝÞß,àáâ,ãäå,æçè,éêë,cfìí,íîï,ilîð,ðñò,orñó,óôõ,uxôö,ö÷ø,AD÷ù,ùúû,GJúü,üýþ,MPýÿ,ÿĀā,SVĀĂ,ĂăĄ,YÁăą,ąĆć,ÄÇĆĈ,ĈĉĊ,ÊÍĉċ,ċČč,ÐÓČĎ,ĎďĐ,ÖÙďđ,đĒē,ÜßĒĔ,ĔĕĖ,âåĕė,ėĘę,èëĘĚ,Ěìě,ïòĜĝ,ĝĞğ,õøĞĠ,ĠġĢ,ûþġģ,ģĤĥ,āĄĤĦ,ĦħĨ,ćĊħĩ,ĩĪī,čĐĪĬ,ĬĭĮ,ēĖĭį,įİı,ęěİIJ,IJĜij,ğĢĴĵ,ĵĶķ,ĥĨĶĸ,ĸĹĺ,īĮĹĻ,ĻļĽ,ıijļľ,ľĴĿ,ķĺŀŁ,ŁłŃ,ĽĿłń,ŅņŇ,ňʼnŊ,ŋŌō,ŎŏŐ,őŒœ,ŔŕŖ,ŗŘř,ŚśŜ,ŝŞş,ŠšŢ,ţŤť,ŦŧŨ,ũŪū,ŬŭŮ,ůŰű,ŲųŴ,ŵŶŷ,ŸŹź,ŻżŽ,žſƀ,ƁƂƃ,ƄƅƆ,ƇƈƉ,ƊƋƌ,ƍƎƏ,ƐƑƒ,ƓƔƕ,ƖƗƘ,ƙƚƛ,ƜƝƞ,ƟƠơ,ƢƣƤ,ŇŊƥƦ,ƦƧƨ,ōŐƧƩ,Ʃƪƫ,œŖƪƬ,ƬƭƮ,řŜƭƯ,ƯưƱ,şŢưƲ,ƲƳƴ,ťŨƳƵ,ƵƶƷ,ūŮƶƸ,Ƹƹƺ,űŴƹƻ,ƻƼƽ,ŷźƼƾ,ƾƿǀ,Žƀƿǁ,ǁǂǃ,ƃƆǂDŽ,DŽDždž,ƉƌDžLJ,LJLjlj,ƏƒLjNJ,NJNjnj,ƕƘNjǍ,ǍǎǏ,ƛƞǎǐ,ǐǑǒ,ơƤǑǓ,Ǔƥǔ,ƨƫǕǖ,ǖǗǘ,ƮƱǗǙ,ǙǚǛ,ƴƷǚǜ,ǜǝǞ,ƺƽǝǟ,ǟǠǡ,ǀǃǠǢ,ǢǣǤ,džljǣǥ,ǥǦǧ,njǏǦǨ,ǨǩǪ,ǒǔǩǫ,ǫǕǬ,ǘǛǭǮ,Ǯǯǰ,ǞǡǯDZ,DZDzdz,ǤǧDzǴ,ǴǵǶ,ǪǬǵǷ,ǷǭǸ,ǰdzǹǺ,ǺǻǼ,ǶǸǻǽ,ǽǹǼ,ńŀŃ,çéƠƢ,äæƝƟ,áãƚƜ,ÞàƗƙ,ÛÝƔƖ,ØÚƑƓ,Õ×ƎƐ,ÒÔƋƍ,ÏÑƈƊ,ÌÎƅƇ,ÉËƂƄ,ÆÈſƁ,ÃÅżž,ÀÂŹŻ,XZŶŸ,UWųŵ,RTŰŲ,OQŭů,LNŪŬ,IKŧũ,FHŤŦ,CEšţ,zBŞŠ,wyśŝ,tvŘŚ,qsŕŗ,npŒŔ,kmŏő,hjŌŎ,egʼnŋ,bdņň,êaƣŅ->

For large tensor networks, it is really an unpleasant interface to work with, we've see expressions that can take a whole slide. There're not many unicode characters that can be used, NumPy/CuPy do not support unicode input, and parsing/processing this kind of long expressions is very tedious. When interfacing a C library like cuTensorNet (or even NumPy internal), the index form is a much better choice.

@arogozhnikov
Copy link
Author

arogozhnikov commented Jan 12, 2024

happy to know I am not alone

😉

We're talking about very different scenarios - I'm trying to cover common use-case: 2 or at most 4 tensors, usually a direct interface for applied scientist.

For large tensor networks, it is really an unpleasant interface to work with

(your argument isn't good - long array of tuples isn't shorter or more readable, API should not target CI as a main scenario).
Regardless, I agree if you generate TNs from some other abstraction - strings are not an appropriate interface (unless there is name-based interface for user - in this case sense to propagate names to simplify error handling logic).

I assume you want to model q entanglement, so I see why you want many indices.
Side remark: einops allows multi-char names (and names can have digits).

Also question to you: aren't you constrained by "single ellipsis"? Do you use ellipsis at all?

@leofang leofang added this to the v2024 milestone Feb 13, 2024
@kgryte kgryte changed the title Batched matmul / einsum / batched tensordot RFC: add support for batched matmul / einsum / batched tensordot Apr 4, 2024
@kgryte kgryte added RFC Request for comments. Feature requests and proposed changes. topic: Linear Algebra Linear algebra. Needs Discussion Needs further discussion. labels Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes. topic: Linear Algebra Linear algebra.
Projects
None yet
Development

No branches or pull requests

4 participants