Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vchan communication a bit crashy #65

Open
pqwy opened this issue Feb 5, 2015 · 2 comments
Open

Vchan communication a bit crashy #65

pqwy opened this issue Feb 5, 2015 · 2 comments
Labels

Comments

@pqwy
Copy link

pqwy commented Feb 5, 2015

I was trying out Vchans to see how much of a fit were they for a high-performance inter-domain logging server. To start off, I created a client-server pair using the Unix backend.

Server:

open Lwt

let port =
  match Vchan.Port.of_string "port" with
  | `Error e -> invalid_arg e
  | `Ok p    -> p

let main () =
  lwt (ic, oc) =
    Vchan_lwt_unix.open_server ~domid:0 ~port () in
  Lwt_io.read_lines ic |> Lwt_stream.iter_s Lwt_io.printl

let () = Lwt_main.run (main ())

Client:

open Lwt

let port =
  match Vchan.Port.of_string "port" with
  | `Error e -> invalid_arg e
  | `Ok p    -> p

let big_string =
  let n = 1024 * 200 in
  let s = String.create n in
  for i = 0 to n - 1 do s.[i] <- '.' done ;
  s.[n - 1] <- 'x' ;
  s

let rec replicate n x = if n < 1 then [] else x :: replicate (pred n) x

let main () =
  lwt (ic, oc) =
    Vchan_lwt_unix.open_client ~domid:0 ~port () in
  Lwt_io.write_line oc "blah"
  >>
  Lwt_list.iter_s (Lwt_io.write_line oc)
    (replicate 100 big_string)

let () = Lwt_main.run (main ())

To check robustness, I tried interrupting the client mid-write and it goes well. When I interrupted the server, howerver, the client just hung in the kernel (status D according to ps) and couldn't be killed.

Then with that hung client still round, I started three separate instances of the server and one more client. And then the dom0 hung.

This is Xen-4.4, everything was in dom0. Will try to attach logs if there were some after I recover access to the machine.

@pqwy
Copy link
Author

pqwy commented Feb 6, 2015

Logs:

Feb  5 16:37:53 mirage kernel: [213263.180601] BUG: Bad page map in process server  pte:80000005d226d167 pmd:03092067
Feb  5 16:37:53 mirage kernel: [213263.180626] page:ffffea0017489b40 count:1 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
Feb  5 16:37:53 mirage kernel: [213263.180648] page flags: 0x2ffff0000000814(referenced|dirty|private)
Feb  5 16:37:53 mirage kernel: [213263.180677] page dumped because: bad pte
Feb  5 16:37:53 mirage kernel: [213263.180688] addr:00007ff7951da000 vm_flags:1c0400fb anon_vma:          (null) mapping:ffff8805d37656a8 index:0
Feb  5 16:37:53 mirage kernel: [213263.180710] vma->vm_ops->fault:           (null)
Feb  5 16:37:53 mirage kernel: [213263.180740] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 [xen_gntalloc]
Feb  5 16:37:53 mirage kernel: [213263.180774] CPU: 3 PID: 3163 Comm: server Not tainted 3.16.0-30-generic #40-Ubuntu
Feb  5 16:37:53 mirage kernel: [213263.180775] Hardware name: Supermicro X8STi/X8STi, BIOS 2.0        09/17/10  
Feb  5 16:37:53 mirage kernel: [213263.180776]  ffff8805d60686c0 ffff8800cc447ac0 ffffffff8178236b 00007ff7951da000
Feb  5 16:37:53 mirage kernel: [213263.180778]  ffff8800cc447b08 ffffffff81194fa1 ffffea0017489b40 ffff8805d37656a8
Feb  5 16:37:53 mirage kernel: [213263.180780]  ffff880003092ed0 ffffea0017489b40 00007ff7951db000 ffff8800cc447c78
Feb  5 16:37:53 mirage kernel: [213263.180782] Call Trace:
Feb  5 16:37:53 mirage kernel: [213263.180790]  [<ffffffff8178236b>] dump_stack+0x45/0x56
Feb  5 16:37:53 mirage kernel: [213263.180794]  [<ffffffff81194fa1>] print_bad_pte+0x1c1/0x290
Feb  5 16:37:53 mirage kernel: [213263.180796]  [<ffffffff81196603>] unmap_page_range+0x743/0x800
Feb  5 16:37:53 mirage kernel: [213263.180798]  [<ffffffff8119673d>] unmap_single_vma+0x7d/0xf0
Feb  5 16:37:53 mirage kernel: [213263.180800]  [<ffffffff81197424>] unmap_vmas+0x54/0xb0
Feb  5 16:37:53 mirage kernel: [213263.180803]  [<ffffffff811a06fc>] exit_mmap+0x9c/0x180
Feb  5 16:37:53 mirage kernel: [213263.180806]  [<ffffffff811253e3>] ? __delayacct_add_tsk+0x153/0x170
Feb  5 16:37:53 mirage kernel: [213263.180809]  [<ffffffff8106cedc>] mmput+0x5c/0x120
Feb  5 16:37:53 mirage kernel: [213263.180812]  [<ffffffff810726dd>] do_exit+0x32d/0xab0
Feb  5 16:37:53 mirage kernel: [213263.180816]  [<ffffffff810a548f>] ? try_to_wake_up+0x1ff/0x370
Feb  5 16:37:53 mirage kernel: [213263.180818]  [<ffffffff81072ee3>] do_group_exit+0x43/0xc0
Feb  5 16:37:53 mirage kernel: [213263.180821]  [<ffffffff8108328f>] get_signal_to_deliver+0x29f/0x8d0
Feb  5 16:37:53 mirage kernel: [213263.180825]  [<ffffffff810125a8>] do_signal+0x48/0xbf0
Feb  5 16:37:53 mirage kernel: [213263.180827]  [<ffffffff81080077>] ? recalc_sigpending+0x17/0x60
Feb  5 16:37:53 mirage kernel: [213263.180829]  [<ffffffff810131b9>] do_notify_resume+0x69/0xb0
Feb  5 16:37:53 mirage kernel: [213263.180832]  [<ffffffff8178a66a>] int_signal+0x12/0x17
Feb  5 16:37:53 mirage kernel: [213263.180834] Disabling lock debugging due to kernel taint
Feb  5 16:37:53 mirage kernel: [213263.180979] BUG: Bad page state in process server  pfn:5d226d
Feb  5 16:37:53 mirage kernel: [213263.181010] page:ffffea0017489b40 count:0 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
Feb  5 16:37:53 mirage kernel: [213263.181061] page flags: 0x2ffff0000000814(referenced|dirty|private)
Feb  5 16:37:53 mirage kernel: [213263.181104] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Feb  5 16:37:53 mirage kernel: [213263.181134] bad because of flags:
Feb  5 16:37:53 mirage kernel: [213263.181159] page flags: 0x800(private)
Feb  5 16:37:53 mirage kernel: [213263.181190] Modules linked in: xen_gntalloc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_physdev xen_netback bridge stp llc iptable_filter ip_tables x_tables xen_gntdev xen_evtchn xenfs xen_privcmd gpio_ich ppdev coretemp joydev serio_raw lpc_ich ioatdma i7core_edac parport_pc tpm_infineon parport shpchp mac_hid edac_core hid_generic usbhid hid psmouse igb pata_acpi e1000e i2c_algo_bit dca ptp pps_core
Feb  5 16:37:53 mirage kernel: [213263.181210] CPU: 3 PID: 3163 Comm: server Tainted: G    B         3.16.0-30-generic #40-Ubuntu
Feb  5 16:37:53 mirage kernel: [213263.181211] Hardware name: Supermicro X8STi/X8STi, BIOS 2.0        09/17/10  
Feb  5 16:37:53 mirage kernel: [213263.181212]  ffffffff81a90f08 ffff8800cc447958 ffffffff8178236b ffffea0017489b40
Feb  5 16:37:53 mirage kernel: [213263.181214]  ffff8800cc447980 ffffffff811719c0 ffffea0017489b40 0000000000000000
Feb  5 16:37:53 mirage kernel: [213263.181216]  ffffea0017489b80 ffff8800cc4479c8 ffffffff81171b8c ff0000ffffffffff
Feb  5 16:37:53 mirage kernel: [213263.181218] Call Trace:
Feb  5 16:37:53 mirage kernel: [213263.181221]  [<ffffffff8178236b>] dump_stack+0x45/0x56
Feb  5 16:37:53 mirage kernel: [213263.181224]  [<ffffffff811719c0>] bad_page.part.68+0xb0/0x100
Feb  5 16:37:53 mirage kernel: [213263.181225]  [<ffffffff81171b8c>] free_pages_prepare+0x17c/0x1d0
Feb  5 16:37:53 mirage kernel: [213263.181227]  [<ffffffff81172745>] free_hot_cold_page+0x35/0x1d0
Feb  5 16:37:53 mirage kernel: [213263.181229]  [<ffffffff81172926>] free_hot_cold_page_list+0x46/0xa0
Feb  5 16:37:53 mirage kernel: [213263.181231]  [<ffffffff8117905a>] release_pages+0xaa/0x250
Feb  5 16:37:53 mirage kernel: [213263.181234]  [<ffffffff811aaec9>] free_pages_and_swap_cache+0xb9/0xe0
Feb  5 16:37:53 mirage kernel: [213263.181236]  [<ffffffff8119498c>] tlb_flush_mmu_free+0x2c/0x50
Feb  5 16:37:53 mirage kernel: [213263.181238]  [<ffffffff81196564>] unmap_page_range+0x6a4/0x800
Feb  5 16:37:53 mirage kernel: [213263.181240]  [<ffffffff8119673d>] unmap_single_vma+0x7d/0xf0
Feb  5 16:37:53 mirage kernel: [213263.181242]  [<ffffffff81197424>] unmap_vmas+0x54/0xb0
Feb  5 16:37:53 mirage kernel: [213263.181244]  [<ffffffff811a06fc>] exit_mmap+0x9c/0x180
Feb  5 16:37:53 mirage kernel: [213263.181246]  [<ffffffff811253e2>] ? __delayacct_add_tsk+0x152/0x170
Feb  5 16:37:53 mirage kernel: [213263.181248]  [<ffffffff8106cedc>] mmput+0x5c/0x120
Feb  5 16:37:53 mirage kernel: [213263.181250]  [<ffffffff810726dd>] do_exit+0x32d/0xab0
Feb  5 16:37:53 mirage kernel: [213263.181252]  [<ffffffff810a548f>] ? try_to_wake_up+0x1ff/0x370
Feb  5 16:37:53 mirage kernel: [213263.181254]  [<ffffffff81072ee3>] do_group_exit+0x43/0xc0
Feb  5 16:37:53 mirage kernel: [213263.181256]  [<ffffffff8108328f>] get_signal_to_deliver+0x29f/0x8d0
Feb  5 16:37:53 mirage kernel: [213263.181259]  [<ffffffff810125a8>] do_signal+0x48/0xbf0
Feb  5 16:37:53 mirage kernel: [213263.181261]  [<ffffffff81080077>] ? recalc_sigpending+0x17/0x60
Feb  5 16:37:53 mirage kernel: [213263.181263]  [<ffffffff810131b9>] do_notify_resume+0x69/0xb0
Feb  5 16:37:53 mirage kernel: [213263.181265]  [<ffffffff8178a66a>] int_signal+0x12/0x17
Feb  5 16:38:06 mirage kernel: [213275.653927] general protection fault: 0000 [#1] SMP 
Feb  5 16:38:06 mirage kernel: [213275.653981] Modules linked in: xen_gntalloc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_physdev xen_netback bridge stp llc iptable_filter ip_tables x_tables xen_gntdev xen_evtchn xenfs xen_privcmd gpio_ich ppdev coretemp joydev serio_raw lpc_ich ioatdma i7core_edac parport_pc tpm_infineon parport shpchp mac_hid edac_core hid_generic usbhid hid psmouse igb pata_acpi e1000e i2c_algo_bit dca ptp pps_core
Feb  5 16:38:06 mirage kernel: [213275.654401] CPU: 0 PID: 3166 Comm: client Tainted: G    B         3.16.0-30-generic #40-Ubuntu
Feb  5 16:38:06 mirage kernel: [213275.654459] Hardware name: Supermicro X8STi/X8STi, BIOS 2.0        09/17/10  
Feb  5 16:38:06 mirage kernel: [213275.654514] task: ffff880002b58000 ti: ffff8805b5d34000 task.ti: ffff8805b5d34000
Feb  5 16:38:06 mirage kernel: [213275.654570] RIP: e030:[<ffffffff8100bbad>]  [<ffffffff8100bbad>] m2p_find_override+0x8d/0xc0
Feb  5 16:38:06 mirage kernel: [213275.654637] RSP: e02b:ffff8805b5d37ae0  EFLAGS: 00010006
Feb  5 16:38:06 mirage kernel: [213275.654670] RAX: 8000000000000200 RBX: dead0000001000e0 RCX: 000000000000bbc4
Feb  5 16:38:06 mirage kernel: [213275.654724] RDX: dead000000100100 RSI: 0000000000000017 RDI: ffffffff81e95e38
Feb  5 16:38:06 mirage kernel: [213275.654778] RBP: ffff8805b5d37af8 R08: ffff880000000488 R09: 8000000000000200
Feb  5 16:38:06 mirage kernel: [213275.654832] R10: ffff880000000000 R11: ffff8805d5027180 R12: ffffffff81ff0a00
Feb  5 16:38:06 mirage kernel: [213275.654886] R13: 000000000001c4e5 R14: ffff8805d5027180 R15: 000000000001c4e5
Feb  5 16:38:06 mirage kernel: [213275.654944] FS:  00007fa3490c1740(0000) GS:ffff880600a00000(0000) knlGS:0000000000000000
Feb  5 16:38:06 mirage kernel: [213275.655001] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb  5 16:38:06 mirage kernel: [213275.655035] CR2: 00007fa34905ffe0 CR3: 00000005d5162000 CR4: 0000000000002660
Feb  5 16:38:06 mirage kernel: [213275.655089] Stack:
Feb  5 16:38:06 mirage kernel: [213275.655116]  00000000005d2310 ffff880600a0a250 ffff8805d749b000 ffff8805b5d37b48
Feb  5 16:38:06 mirage kernel: [213275.655188]  ffffffff8100bf18 ffffea00175d02c0 800000000001c4e5 0000000100000003
Feb  5 16:38:06 mirage kernel: [213275.655260]  0000000000000000 0000000000000000 00000000005d226d 800000000001c4e5
Feb  5 16:38:06 mirage kernel: [213275.655333] Call Trace:
Feb  5 16:38:06 mirage kernel: [213275.655364]  [<ffffffff8100bf18>] m2p_remove_override+0x338/0x390
Feb  5 16:38:06 mirage kernel: [213275.655400]  [<ffffffff8100c002>] clear_foreign_p2m_mapping+0x92/0x1a0
Feb  5 16:38:06 mirage kernel: [213275.655439]  [<ffffffff814690d8>] gnttab_unmap_refs+0x38/0x40
Feb  5 16:38:06 mirage kernel: [213275.655477]  [<ffffffffc0170163>] unmap_grant_pages+0x113/0x2b0 [xen_gntdev]
Feb  5 16:38:06 mirage kernel: [213275.655517]  [<ffffffffc01704f3>] mn_release+0x53/0x130 [xen_gntdev]
Feb  5 16:38:06 mirage kernel: [213275.655555]  [<ffffffff811bb274>] __mmu_notifier_release+0x44/0xd0
Feb  5 16:38:06 mirage kernel: [213275.655593]  [<ffffffff811a07bd>] exit_mmap+0x15d/0x180
Feb  5 16:38:06 mirage kernel: [213275.655630]  [<ffffffff81789a60>] ? _raw_spin_unlock_irqrestore+0x20/0x70
Feb  5 16:38:06 mirage kernel: [213275.655669]  [<ffffffff811253e7>] ? __delayacct_add_tsk+0x157/0x170
Feb  5 16:38:06 mirage kernel: [213275.655708]  [<ffffffff8122cb7c>] ? exit_aio+0x3c/0xf0
Feb  5 16:38:06 mirage kernel: [213275.655744]  [<ffffffff8106cedc>] mmput+0x5c/0x120
Feb  5 16:38:06 mirage kernel: [213275.655778]  [<ffffffff810726dd>] do_exit+0x32d/0xab0
Feb  5 16:38:06 mirage kernel: [213275.655814]  [<ffffffff81173f0c>] ? __alloc_pages_nodemask+0x19c/0xbc0
Feb  5 16:38:06 mirage kernel: [213275.655852]  [<ffffffff81072ee3>] do_group_exit+0x43/0xc0
Feb  5 16:38:06 mirage kernel: [213275.655888]  [<ffffffff8108328f>] get_signal_to_deliver+0x29f/0x8d0
Feb  5 16:38:06 mirage kernel: [213275.655926]  [<ffffffff810125a8>] do_signal+0x48/0xbf0
Feb  5 16:38:06 mirage kernel: [213275.655961]  [<ffffffff8100140a>] ? xen_hypercall_event_channel_op+0xa/0x20
Feb  5 16:38:06 mirage kernel: [213275.655999]  [<ffffffff8100140a>] ? xen_hypercall_event_channel_op+0xa/0x20
Feb  5 16:38:06 mirage kernel: [213275.656039]  [<ffffffffc016b8a8>] ? evtchn_ioctl+0x108/0x3dc [xen_evtchn]
Feb  5 16:38:06 mirage kernel: [213275.656078]  [<ffffffff811f51f8>] ? do_vfs_ioctl+0x2c8/0x4a0
Feb  5 16:38:06 mirage kernel: [213275.656114]  [<ffffffff810131b9>] do_notify_resume+0x69/0xb0
Feb  5 16:38:06 mirage kernel: [213275.656150]  [<ffffffff8178a66a>] int_signal+0x12/0x17
Feb  5 16:38:06 mirage kernel: [213275.656183] Code: ba a2 e8 00 e8 f5 df 77 00 49 8b 14 24 49 39 d4 48 8d 5a e0 75 15 eb 36 66 0f 1f 44 00 00 48 8b 53 20 49 39 d4 48 8d 5a e0 74 23 <4c> 39 6a 10 75 ed 48 89 c6 48 c7 c7 38 5e e9 81 e8 7e de 77 00 
Feb  5 16:38:06 mirage kernel: [213275.656569] RIP  [<ffffffff8100bbad>] m2p_find_override+0x8d/0xc0
Feb  5 16:38:06 mirage kernel: [213275.656609]  RSP <ffff8805b5d37ae0>
Feb  5 16:38:06 mirage kernel: [213275.656901] ---[ end trace 0463bb40a30377cb ]---
Feb  5 16:38:06 mirage kernel: [213275.656967] Fixing recursive fault but reboot is needed!
Feb  5 16:47:01 mirage kernel: [213811.162416] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=15002 jiffies, g=872352, c=872351, q=0)
Feb  5 16:47:01 mirage kernel: [213811.162646] INFO: Stall ended before state dump start
Feb  5 16:50:01 mirage kernel: [213991.210411] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=60014 jiffies, g=872352, c=872351, q=0)
Feb  5 16:50:01 mirage kernel: [213991.210494] INFO: Stall ended before state dump start
Feb  5 16:53:02 mirage kernel: [214171.258415] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=105026 jiffies, g=872352, c=872351, q=0)
Feb  5 16:53:02 mirage kernel: [214171.258499] INFO: Stall ended before state dump start
Feb  5 16:56:02 mirage kernel: [214351.306416] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=150038 jiffies, g=872352, c=872351, q=0)
Feb  5 16:56:02 mirage kernel: [214351.306499] INFO: Stall ended before state dump start
Feb  5 16:59:02 mirage kernel: [214531.354417] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=195050 jiffies, g=872352, c=872351, q=0)
Feb  5 16:59:02 mirage kernel: [214531.354501] INFO: Stall ended before state dump start
Feb  5 17:02:02 mirage kernel: [214711.402416] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=240062 jiffies, g=872352, c=872351, q=0)
Feb  5 17:02:02 mirage kernel: [214711.402499] INFO: Stall ended before state dump start
Feb  5 17:05:02 mirage kernel: [214891.442412] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=285072 jiffies, g=872352, c=872351, q=0)
Feb  5 17:05:02 mirage kernel: [214891.442496] INFO: Stall ended before state dump start
Feb  5 17:08:02 mirage kernel: [215071.498411] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=330086 jiffies, g=872352, c=872351, q=0)
Feb  5 17:08:02 mirage kernel: [215071.498495] INFO: Stall ended before state dump start

...

Feb  5 20:59:05 mirage kernel: [228935.194416] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 4, t=3796010 jiffies, g=872352, c=872351, q=0)
Feb  5 20:59:05 mirage kernel: [228935.194501] INFO: Stall ended before state dump start

It was rebooted at this point.

@djs55
Copy link
Member

djs55 commented Mar 14, 2016

I think we have to wait for upstream Xen to implement "revokable grants" to fix this properly. Without this we end up with pages in funny states which then provokes ... interesting ... error behaviour in Linux.

@djs55 djs55 added the bug label Mar 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants