Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Privacy] Address privacy requirement around "network changes" #30

Open
mingyc opened this issue Sep 7, 2022 · 19 comments
Open

[Privacy] Address privacy requirement around "network changes" #30

mingyc opened this issue Sep 7, 2022 · 19 comments

Comments

@mingyc
Copy link
Collaborator

mingyc commented Sep 7, 2022

The discussion from #28 (comment) suggests that one of the privacy requirements around network change, as stated below, is too ambiguous:

Beacons are only sent over the same network that was active when the beacon was registered (e.g. > if the user goes offline and moves to a new network, discard pending beacons).

The foundamental privacy issue here is to prevent beacon from sending to fulfill the expectation that users expect a network cannot know about their previous navigation history when
(1) they have already left a page (either by closing a tab or by navigating away) AND
(2) they have already left the origin network

The following example is the same from the linked comment:

  1. User is on network1 (e.g. mobile network)
  2. User navigates to site1.com
  3. site1.com creates and queues beacon1 on the page.
  4. User navigates away from site1.com
    A. where site1.com is put into bfcache, and beacon1 is still pending.
    B. browser checks whether network has changed or not => No
  5. User switches to network2 (e.g. office wifi), and thinks that network2 wouldn't know about the navigation history of site1.com
  6. beacon1 is sent by browser via network2 (when either page evicted from bfcache or timeout/backgroundTimeout timer ends) => an unexpected behavior

Note that Step 5 can always happen after 4-B as the device's network change takes some time to propagate from OS to browser.

Proposed solution

The current proposal is to force the browser to send out all beacons when a page is no longer open, i.e.

  • the user has navigated away from the current page
    • => handles forced sending before pagehide event is dispatched)
  • the user has closed the current page
    • => already handled in the cases of tab closure/browser closure)

and replaces 4-B with sending out all beacons.

Cons

  1. There can still be leak to network2 if a network change happens very quickly in between B & D
    A. navigating away
    B. browser forces sending out all beacon1 from network1
    C. (network changes to network2)
    D. The browser's network API actually sends out request constructed from beacon1.
  2. This potentially makes backgroundTimeout unusable for "navigating away" use case, as the the time between this timers starts (i.e. page enters hidden state) and forced sending (i.e. page enters bfcache) can be very short. But it can still work when the page visibility change but navigation doesn't happen, e.g. switching to another tab.
@mingyc
Copy link
Collaborator Author

mingyc commented Sep 12, 2022

A followup question: whether to force sending out on different-page navigation or different-site navigation.

Note that a beacon is attached to a document (page).

mingyc added a commit that referenced this issue Sep 14, 2022
Update Privacy section to clarify why beacon needs to be stopped when network changes (#27 #28)

Also added links to related privacy discussions: #3, #27, #30, #34
Updated beacon TTL to 30min per suggestion in #16 (comment)
aarongable pushed a commit to chromium/chromium that referenced this issue Sep 15, 2022
…away to a different document.

This is to mitigate potential privacy issue that when network changes
after users think they have left a page, beacons queued in that page
still exist and get sent through the new network, which leaks navigation history to the new network. See [1] for more details.

This CL adds the following changes:

1) In browser: Add a call to `PendingBeaconHost::SendAllOnPagehide()` from `RenderFrameHostManager::UnloadOldFrame()` to let old RFH send out all its pending beacons before it is put into BFCache or being unloaded.

2) In renderer: Update `PendingBeaconDispatcher` to be called on to update all its pending beacons' states before a `pagehide` event is dispatched to event listeners.

This change makes `backgroundTimeout` property useless in the scenario that users navigate away to a different page. But it can still work in other cases like when tab is minimized or when switching tabs.

[1]: WICG/pending-beacon#30

Bug: 1293679
Change-Id: I5a7ddf4284717e1af482286dd6f56344d4ef4b26
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3881967
Reviewed-by: Rakina Zata Amni <rakina@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Ming-Ying Chung <mych@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1047323}
@fergald
Copy link
Collaborator

fergald commented Sep 22, 2022

If the user still has another instance of site1.com open or opens one later on or there is a service worker for site1.com, then it should be fine to deliver the beacon, we're not doing anything the site itself couldn't do (e.g. if it wrote data to IDB and managed delivery itself).

I think this would leave us a small risk of dropping data due to network change.

We still need to decide if it's OK to wait while in BFCache or not (since there is still a race with the network change event).

mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
…away to a different document.

This is to mitigate potential privacy issue that when network changes
after users think they have left a page, beacons queued in that page
still exist and get sent through the new network, which leaks navigation history to the new network. See [1] for more details.

This CL adds the following changes:

1) In browser: Add a call to `PendingBeaconHost::SendAllOnPagehide()` from `RenderFrameHostManager::UnloadOldFrame()` to let old RFH send out all its pending beacons before it is put into BFCache or being unloaded.

2) In renderer: Update `PendingBeaconDispatcher` to be called on to update all its pending beacons' states before a `pagehide` event is dispatched to event listeners.

This change makes `backgroundTimeout` property useless in the scenario that users navigate away to a different page. But it can still work in other cases like when tab is minimized or when switching tabs.

[1]: WICG/pending-beacon#30

Bug: 1293679
Change-Id: I5a7ddf4284717e1af482286dd6f56344d4ef4b26
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3881967
Reviewed-by: Rakina Zata Amni <rakina@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Ming-Ying Chung <mych@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1047323}
NOKEYCHECK=True
GitOrigin-RevId: 2be095e89fa5007ea5bc1bfe4419165471be3454
@tungnh28
Copy link

tungnh28 commented Nov 7, 2022

Hmm, there're quite a few APIs having the same privacy concerns here (such as report API and background sync). However, I think this leak is quite limited by the fact that pending beacon requests must be HTTPS only, so the request contents are not revealed (only IP will be leaked), and only under a certain situation of racing between network change event and beacon request setup (because we would discard entirely pending requests after getting network change).

Ideally, beacons must only be sent persistently over only one network (tagged at the time the beacon was registered), and discard the future sending if we find the network is different than the old one. There was an immature exploration of Chrome regarding the idea (unfortunately I actually don't know the state or underneath tech difficulties, still asking around). I guess it's time to push it forward.

@mingyc
Copy link
Collaborator Author

mingyc commented Nov 8, 2022

... only under a certain situation of racing between network change event and beacon request setup (because we would discard entirely pending requests after getting network change).

@tungnh28 Do you suggest that it's ok for us to try with best-effort, i.e. dropping pending beacons on network changes, after the tab is closed, in the browser process?

Ideally, beacons must only be sent persistently over only one network

We would argue that a beacon should be sendable over different network as long as the tab that creates the beacon is still open when network changes, as developers can achieve this same behavior by other existing APIs.

@tungnh28
Copy link

tungnh28 commented Nov 8, 2022

Do you suggest that it's ok for us to try with best-effort, i.e. dropping pending beacons on network changes, after the tab is closed, in the browser process?

Given the current risks are pretty low, the best-effort approach sounds good to me (I think we are doing the same with report API).

We would argue that a beacon should be sendable over different network as long as the tab that creates the beacon is still open when network changes, as developers can achieve this same behavior by other existing APIs.

Yeah, indeed, and with extending in the comment, (with double keyed because we don't have to concern about user's friction).

@mingyc mingyc changed the title Address privacy requirement around "network changes" [Privacy] Address privacy requirement around "network changes" Nov 10, 2022
@mingyc
Copy link
Collaborator Author

mingyc commented Nov 15, 2022

@tungnh28 Could you please confirm if beacons can only be sent over the same network can be bypassed if background sync is enabled given this comment?

(I also asked similar question here)

mingyc added a commit to mingyc/pending-beacon that referenced this issue Nov 15, 2022
mingyc added a commit to mingyc/pending-beacon that referenced this issue Nov 15, 2022
mingyc added a commit to mingyc/pending-beacon that referenced this issue Nov 15, 2022
@tungnh28
Copy link

I don't think so, we had the same privacy concern/limitation in background sync as well (that's one of the motivation of background-sync preference setting, for users who knew the risks and want to disable background sync).

mingyc added a commit to mingyc/pending-beacon that referenced this issue Nov 15, 2022
@mingyc
Copy link
Collaborator Author

mingyc commented Nov 15, 2022

OK. Then background-sync permission just grant use to bypass checking other live site instance?

I've updated this change. @tungnh28 Could you also please take a look? Thanks!

@fergald
Copy link
Collaborator

fergald commented Nov 16, 2022

@tungnh28 can you explain why this sending wouldn't just be governed by the background-sync permission? As you say, it's the same concerns, so it seems like it should be the same permission.

In particular, a site that wants to send data in the background can do so from a service worker if it has that permission. This seems like another case where we could be pushing devs to hand-roll something instead of using this API.

The workaround for this are either to

  1. always send on pagehide and lose some batching opportunities
  2. pay attention to network changes and sometimes lose data

1 is probably OK but I feel inconsistent for background sync not to apply to this network activity.

@tungnh28
Copy link

tungnh28 commented Nov 17, 2022

can you explain why this sending wouldn't just be governed by the background-sync permission?

As the primary privacy concern here, we are trying to prevent beacon from revealing the previous navigation history (host) in some certain situations, apparently gating behind background-sync does not help us resolving it. However, it might be the right thing to do since we are expecting to provide an alternative API which should not be less capable than the existing one.

The privacy risks seem to be quite reduced and I don't think that concern should block us moving forward. I'd like to friendly tag someone who knows the old privacy story of background-sync for better ideas. @beverloo, maybe?

@mingyc mingyc removed the api Issue with API specs label Nov 25, 2022
@tungnh28
Copy link

tungnh28 commented Nov 30, 2022

Friendly tag @Sauski and I am looking forward to the feedback of privacy team.

@Sauski
Copy link

Sauski commented Dec 1, 2022

I realise that, from a privacy perspective, we don't have a consistent stance on APIs which want to communicate after the page has been unloaded. After chatting with the Chrome privacy folks, we've come up with a stance that applies to any API which wants to send data at some point after the page has been "unloaded" (with the exclusion of periodic background sync, which is limited to PWAs and is a different beast).

Starting at time T0, defined as when the eTLD+1 for the page which wants to communicate (e.g. send a beacon / report / sync message) is _no longer _ "fully active" in any tab - e.g. considered as "non-existent” from the user’s perspective. Note that entering the BFCache does explicitly not extend this.

At time T0, the user agent could send any beacons / reports / sync messages, regardless of the background sync user permission. Acknowledging that unload handlers + fetch get polyfill this functionality % reliability.

Past time T0 (assuming the eTLD+1 remains "non-existent"), any outbound communication requires the background sync user permission.

Additionally, past time T0, any network change must result in any pending communications being dropped (regardless of why they are pending, e.g. including because they are in BFCache). The time between T0 and a network change is not relevant - a user may have closed tabs just before swapping networks, precisely because they don't want those sites communicating on the new network.

We also consider some "timeout" time past T0, where regardless of network change, it would be unreasonable to send the communication. Elsewhere it was discussed that Firefox has a 40 minute timeout, and this would definitely be a hard upper bound. Below that, we don't have strong feelings - obviously shorter is more private, but below this hard limit, it seems OK for the API to decide.

Acknowledge that this stance isn't respected by some APIs (e.g. background sync doesn't consider network change) - from a privacy perspective, we'd consider this a privacy deficiency, and will be working to rectify that. Going forward, from a privacy perspective, we'd be looking to uphold the bar defined above for all APIs.

@mingyc
Copy link
Collaborator Author

mingyc commented Dec 2, 2022

Thanks for the updates!

@Sauski I have the following questions

defined as when the eTLD+1 for the page which wants to communicate (e.g. send a beacon / report / sync message) is _no longer _ "fully active" in any tab

  1. Will it still be eTLD+1 after 3P Storage Partitioning Key is launched everywhere, which changes the key to (Root Frame eTLD+1, current frame Origin)?

Past time T0 (assuming the eTLD+1 remains "non-existent"), any outbound communication requires the background sync user permission.

  1. Does this imply that opening another eTLD+1 tab after previous T0 will lead to another T0?

@Sauski
Copy link

Sauski commented Dec 5, 2022

Will it still be eTLD+1 after 3P Storage Partitioning Key is launched everywhere, which changes the key to (Root Frame eTLD+1, current frame Origin)?

I could see this going either way, from a raw privacy perspective, the embedded origin could post message up to the top frame and have it act on its behalf, recovering the raw eTLD+1 scope. On the other hand, if we don't have a good reason to break away from the strong model, I don't see why we should.

Does this imply that opening another eTLD+1 tab after previous T0 will lead to another T0?

It's my intuition is that it would (@tungnh28 for thoughts). E.g. if you had something pending send, you could refresh the pending time when the user re-opened the site (or the site came out of bfcache etc).

I worry however, that this would lead to some potentially perverse interpretations (e.g. holding onto some data beyond the lifetime described above on the hope that another "T0" occurs and everything becomes valid again)

@tungnh28
Copy link

tungnh28 commented Dec 6, 2022

Thank you very much for the insight.

It's my intuition is that it would (@tungnh28 for thoughts). E.g. if you had something pending send, you could refresh the pending time when the user re-opened the site (or the site came out of bfcache etc).

It sounds reasonable to me, although I am still a bit concerned about breaking the gap between embedded iframe and 1P frame. Given the case the pending beacon request is coming from iframe (please correct me if it's not the case), and there's exactly an active instance of the same top frame + different subframe (or we are reopening the top level page), it seems not right to me.

Edit: I was saying the case of reopening the same tld, but different subframes.

@mingyc
Copy link
Collaborator Author

mingyc commented Dec 8, 2022

@Sauski

the embedded origin could post message up to the top frame and have it act on its behalf, recovering the raw eTLD+1 scope.

But if we only focuses on behaviors after T0 (defined as when the eTLD+1 for the page which wants to communicate is no longer "fully active" in any tab), the page should not be able to make any kinds of communication to recover the scope?

--
Some follow up questions:

  1. Do you think crash recovry from disk storage will be banned by this privacy decision?
    • The timing when a browser crashes can be considered as T0
    • Background sync permission for a site may or may not have changed after reopening the browser.
  2. If a dedicated worker is running for a page, it should also follow this rule that pending beacons (created by the worker) should not be sent after T0?

@tungnh28 Could you please explain more with examples? I don't quite get your concern.

Pending beacons can bet sent by any type of documents (may not be iframe).

@tungnh28
Copy link

tungnh28 commented Dec 8, 2022

Is it the case of reopening top level frame, causing different subframes?
I was saying due to some web apps (for example google app script) or user interaction, the subframe A would be injected (or result of navigation) and trigger pending beacon requests. However reopening the top level page is going back to the original state (where A is not available)

@Sauski
Copy link

Sauski commented Dec 19, 2022

Some more thoughts:

Do you think #34 will be banned by this privacy decision?
The timing when a browser crashes can be considered as T0
Background sync permission for a site may or may not have changed after reopening the browser.

I don't think banned would be the interpretation - difficult to have meet the proposed privacy requirements might be better. Can you help me better understand why browser level crashes are important to include provisions for? Crashes in the renderer, sure, but then we can continue to track network changes at the browser level.

From a security perspective, I'm not sure we'd even want to reveal that a browser level crash occurred.

If a dedicated worker is running for a page, it should also follow this rule that pending beacons (created by the worker) should not be sent after T0?

It's not that they shouldn't be sent, but I can't see any reason that they shouldn't follow the same requirements as above (e.g. they'd at least need the background sync capability).

@mingyc
Copy link
Collaborator Author

mingyc commented Jan 12, 2024

Re-surfacing this discussion.

Per latest discussion from Chrome Privacy reviewers, it is expected that any new API implemented in Chrome should not perform any background communication if none of the original document that initiates such communication is alive, even if such behavior can already be achieved by using service worker. This is because a stricter requirement is enforced on new APIs. (Also see previous conclusion for PendingBeacon API in #30 (comment))

Hence, a simplified version is currently implemented in Chrome for upcoming Origin Trial, that any pending fetchLater requests for a document will be flushed if that document enters BFCache, without checking if any of other same-origin documents is still open. In the future, a less stricter version should be considere by checking for the existence of other same-origin documents before flushing.

@noamr PTAL. We probably have to update the spec to include this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants