Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to config max segmentation tokens length for RAG document using environment variable #4375

Conversation

rainchen
Copy link
Contributor

Description

allow to config max segmentation tokens length for RAG document using environment variable

Fixes # (issue)

Fix #2438 #3290
also #4071

Type of Change

Please delete options that are not relevant.

  • This change requires a documentation update, included: Dify Document
  • Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement

How Has This Been Tested?

manually tested

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
  • optional I have made corresponding changes to the documentation
  • optional I have added tests that prove my fix is effective or that my feature works
  • optional New and existing unit tests pass locally with my changes

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 📚 documentation Improvements or additions to documentation labels May 14, 2024
@rainchen rainchen force-pushed the support-custom-segmentation-max-tokens-max-length branch from ec1c1c2 to b4b6186 Compare May 14, 2024 09:54
@JohnJyong
Copy link
Contributor

pls add the default value for the new variable in the file : config.py , thanks @rainchen

@JohnJyong JohnJyong self-requested a review May 14, 2024 11:48
@rainchen rainchen force-pushed the support-custom-segmentation-max-tokens-max-length branch 3 times, most recently from 3a39f81 to 9ac93da Compare May 15, 2024 02:44
@rainchen rainchen force-pushed the support-custom-segmentation-max-tokens-max-length branch from 9ac93da to 1cef018 Compare May 15, 2024 02:46
@rainchen
Copy link
Contributor Author

rainchen commented May 15, 2024

pls add the default value for the new variable in the file : config.py , thanks @rainchen

@JohnJyong refactored, please review again

@JohnJyong
Copy link
Contributor

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 20, 2024
@JohnJyong JohnJyong merged commit c255a20 into langgenius:main May 20, 2024
7 checks passed
@rennokki
Copy link
Contributor

@rainchen AMAZING PR! I was wondering if something like this is possible because I was having some texts that were > 1k and they got broke down.

@daigo38
Copy link

daigo38 commented May 27, 2024

I have set the docker-compose.yaml environment variable on self-host and restarted the container, but the changes are not applied.
I am building with the “langgenius/dify-api:0.6.8” image, is this feature already enabled?

@takatost takatost mentioned this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 documentation Improvements or additions to documentation lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chunk Size
4 participants