Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to json_fields don't invalidate cached results #14

Open
joeyAghion opened this issue Mar 26, 2015 · 7 comments
Open

Changes to json_fields don't invalidate cached results #14

joeyAghion opened this issue Mar 26, 2015 · 7 comments
Labels

Comments

@joeyAghion
Copy link

When we deploy a new version of our codebase that includes changes to a json_fields declaration (e.g., adding a new field), any cached data for that model should be invalidated. Instead, the stale data continues to be returned and a full cache clear is required before the correct fields are returned.

This could take the form of an additional version option that's simply incremented by developers upon making changes that should invalidate cached results. (I'm not sure how to cleanly offer the option without potentially colliding with the model's actual field definitions, but it seems we already do something like this with hide_as_child_json_when). A more sophisticated approach might be to include a hash of the json_fields definition in the cache keys. This would change when the definition changes, automatically invalidating any previously cached data.

@dblock
Copy link
Collaborator

dblock commented Mar 26, 2015

👍

@dblock dblock added the feature label Mar 26, 2015
@wrgoldstein
Copy link
Contributor

It seems to me that the current versioning system allows for this.. do you just not want to increment the version of each item in the json_fields block?

@joeyAghion
Copy link
Author

@wrgoldstein I think you're correct that that would work, but it's just impractical to declare a new version and update all the as_json calls each time the schema changes. I was hoping for something that would "just work" (hopefully without being too complex).

@joeyAghion
Copy link
Author

@wrgoldstein and I worked on this a little. Automatically "fingerprinting" the json_fields declaration is impractical because of all the lambdas... I don't know of a way to consistently digest their definitions into a string representation. Alternatives include:

  1. Fingerprint just a subset of the definition (e.g., just the keys or the declaration excluding lambdas)
  2. Add an optional version to the whole json_fields declaration (requiring developers to add/increment the version when they intend to invalidate previously-cached values)
  3. Do nothing

Let me know if you think any of these are worthwhile. I'm not sure (1) or (2) are worth the complexity.

@dblock
Copy link
Collaborator

dblock commented Apr 16, 2015

I vote to do nothing, however this is an interesting problem. The original problem is that we have new code and are serving old data. There're two scenarios:

Clients Must Change

This is a schema change to JSON responses, so I think you want to treat it as a migration, and invalidate the appropriate caches part of the deployment. The more atomic alternative is (2) IMO, (1) seems like a lot of work and hard to debug cache invalidations on deployment.

Clients Roll Change

Clients are OK with old data but we don't want to wait our cache invalidation lifetime (eg. 24 hours). A possible "solution" is to create a strategy by which upon deployment lifetime of objects is decreased, so maybe caches will invalidate over the next 15 minutes instead of 24, but still not at once.

@mzikherman
Copy link

This is a great issue.

What I've generally done (and of course always forgotten 😄 ), is to explicitly invalidate the cache for any affected object where I've changed the JSON representation. I agree that this is a lot of work - modifying the json for AdditionalImage means busting the cache for every image!

In terms of your second point, I would be willing to rely on that, however I regularly see objects that persist in the cache for much longer than 24 hours.

For example, I deploy a code changing the JSON schema for a class, yet several days after that deploy, I'm still seeing stale JSON coming thru the cached json. Shouldn't that not happen if the cache was being expired?

I don't see where (in this gem), an expires_in or such type of parameter is being used, and the default Rails.cache is to never expire unless explicitly told so, and to instead rely on normal cache eviction by your caching server.

@mzikherman
Copy link

So I guess my question (hit 'Enter' too early above), is what's the proper way to use cache expiration time here? The shortest cache we can get away with the better, and I have a hunch we're currently (in production) not expiring this cache (we are invalidating however).

I think we can fix our specific issue by modifying our config/memcached.yml to include an expires_in set from an ENV var, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants