2008-10-02 03:07 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-10-02 04:19 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-10-02 07:08 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-10-02 07:08 hello! 2008-10-02 08:55 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-02 09:35 -!- Kirantpatil(~kiran@122.166.93.80) has joined #tux3 2008-10-02 09:35 -!- Kirantpatil(~kiran@122.166.93.80) has left #tux3 2008-10-02 11:23 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-02 14:05 -!- kd(~kd@121.246.35.242) has joined #tux3 2008-10-02 14:09 -!- orgthingy_(~orgthingy@62.150.55.188) has joined #tux3 2008-10-02 15:47 well, extent read is a little harder than extent write in one respect 2008-10-02 15:47 for the write, we know to form up extents by searching for adjoining dirty regions 2008-10-02 15:47 dirty buffers I mean 2008-10-02 15:48 for read we don't 2008-10-02 15:48 I suppose I could implement readahead here 2008-10-02 15:48 and just go read a whole extent every time somebody asks for a buffer 2008-10-02 15:48 for now 2008-10-02 15:48 the problem is, the buffer at a time high level interface is lame 2008-10-02 15:49 but accurately models what we will get in kernel 2008-10-02 15:49 we need an extent at a time interface that comes from the sys_write level 2008-10-02 15:50 which there is a hook for 2008-10-02 15:50 but it means bypassing the whole generic_read/write mess 2008-10-02 15:50 which might be ok in that it means bypassing a big mess 2008-10-02 15:50 but it also means we will have to maintain essentially a forked version of the read/write library 2008-10-02 15:51 volunteers? 2008-10-02 16:13 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-02 16:15 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: friends of grab_cache_page ~ Postponed till 9 pm tonight, thursday Oct 2" 2008-10-02 16:15 -!- flips changed mode/#tux3 -> -o flips 2008-10-02 16:15 nice 2008-10-02 16:15 flips : you should ask in other channels 2008-10-02 16:15 here all people try to help 2008-10-02 16:16 flips : ask in Linux and opensource channels 2008-10-02 16:16 or, in EFnet 2008-10-02 16:16 ? 2008-10-02 16:16 some may-be interested 2008-10-02 16:16 about what? 2008-10-02 16:16 in helping you 2008-10-02 16:16 oh 2008-10-02 16:16 feel free to ask 2008-10-02 16:16 well, i really dont know much about it? 2008-10-02 16:16 so, i dont really know if i can ask 2008-10-02 16:16 that's ok 2008-10-02 16:16 :P 2008-10-02 16:17 ACTION invites flips to #ubuntu-offtopic @ irc.freenode.net 2008-10-02 16:17 just tell people a filesystem project needs devs, is willing to help them learn 2008-10-02 16:17 sure 2008-10-02 16:18 :) 2008-10-02 16:20 doesn't sound too professional. :| 2008-10-02 16:20 * snuxoll knows nothing about filesystem design flips 2008-10-02 16:20 see? looks matter :P 2008-10-02 16:20 but, still, ask 2008-10-02 16:20 and explain 2008-10-02 16:20 there are lots of people there 2008-10-02 16:24 sure, but C coders? 2008-10-02 16:24 flips : common 2008-10-02 16:24 its linux channel 2008-10-02 16:24 its full of C coders :P 2008-10-02 16:26 you'd be surprised 2008-10-02 16:26 these days linux coding seems to be more about php than anything 2008-10-02 16:27 flips : php? 2008-10-02 16:27 nay 2008-10-02 16:27 Python and C 2008-10-02 16:27 C++ a bit 2008-10-02 16:27 << python 2008-10-02 16:28 orgthingy, we could use more people playing with the fuse stuff 2008-10-02 16:28 and just trying it and complaining about broken things 2008-10-02 16:28 that's one way to get shapor to code ;) 2008-10-02 16:28 flips : FreeNode is like a UNIX and Linux network 2008-10-02 16:28 lots of programmers there 2008-10-02 16:28 he's awesome when he does 2008-10-02 16:28 you *should* find someone there 2008-10-02 16:28 heh :) 2008-10-02 16:29 flips : and sourceforge would be good idea, so is stumble-upon and digg 2008-10-02 16:30 flips : if it stays small, it'd be "just another free software project" but if you "market" it (not business term) it'd be "just another great opensource project" 2008-10-02 16:30 ok, there's my troll 2008-10-02 16:30 anything else has to come from the grassroots 2008-10-02 16:30 that means you, orgthingy 2008-10-02 16:31 "troll" ? 2008-10-02 16:31 ACTION didnt understand what flips meant  2008-10-02 16:32 wikepedia that 2008-10-02 16:33 you mean troll as is in "annoying useless dude in IRC" ? 2008-10-02 16:33 orgthingy, my time is better spent making code happen, it's up to people who want to help to go spread the word 2008-10-02 16:33 flips : well, i think time is worth looking for people to *code* with you 2008-10-02 16:33 troll as in "saying something controversial in order to get a response" 2008-10-02 16:33 get what i mean? 2008-10-02 16:34 oh, common :( 2008-10-02 16:34 orgthingy, my time is also better spent encouraging people to go out and find coders than going out and hunting myself 2008-10-02 16:34 ok 2008-10-02 16:34 ACTION hides 2008-10-02 16:35 time being at a premium here 2008-10-02 16:35 sorry :| 2008-10-02 16:35 got to get extent reading working today according to me 2008-10-02 16:35 see the resonding lack of response on ubuntu channel 2008-10-02 16:35 prevailing attitude seems to be "work is somebody else's problem, we're here to hang and feel leet" 2008-10-02 16:36 maybe that's not accurate 2008-10-02 16:36 flips : maybe because it's offtopic channel 2008-10-02 16:36 and maybe because ubuntu users dont program 2008-10-02 16:36 I think the latter is the reason 2008-10-02 16:36 one of the reasons 2008-10-02 16:37 willingness to contribute could certainly be better 2008-10-02 16:37 being willing to always lose the early adopter race to gentoo is not a healthy attitude 2008-10-02 16:37 ok, im asking while you're coding 2008-10-02 16:37 :D 2008-10-02 16:38 seems fair 2008-10-02 16:38 best strategy is just for somebody like you to say there, "there's cool stuff going down on oftc #tux3, why not drop by for a visit" 2008-10-02 16:39 flips : well, thats called trolling in freenode :P 2008-10-02 16:40 or, spamming 2008-10-02 16:40 Id rather ask if someone is interested 2008-10-02 16:40 if they are, ill tell them to come by 2008-10-02 16:40 drop* 2008-10-02 16:40 orgthing, not in #offtopic 2008-10-02 16:40 ok 2008-10-02 16:40 im asking in many channels like #C 2008-10-02 16:40 if it worries you, then give a url instead of a channel 2008-10-02 16:40 #c would be good 2008-10-02 16:41 any c coder can become a kernel coder, or if they don't like kernel, fuse is entirely userspace 2008-10-02 16:41 flips : sorry, but i said "we" but i think suing "we" is better than "he" :P 2008-10-02 16:41 we is correct 2008-10-02 16:42 "we" as in "everybody who thinks fat baby penquins are cute" 2008-10-02 16:42 haha 2008-10-02 16:44 flips : i think all programmers are asleep now 2008-10-02 16:44 usually, all of them go like "i want to join!" 2008-10-02 16:44 meh, maybe they're asleep :P 2008-10-02 16:44 programmers are usually asleep 2008-10-02 16:44 just as shapor 2008-10-02 16:45 just ask shapor 2008-10-02 16:45 some are drunk :P (yes really xD) 2008-10-02 16:45 or drunk, right, when they're awake 2008-10-02 16:45 that's why tux3 project has a requirement for beer to be sent 2008-10-02 16:46 to keep our programmers "in the zone" 2008-10-02 16:47 hmm, some say they're already programming in other projects :P 2008-10-02 16:47 haha 2008-10-02 16:47 haha 2008-10-02 16:48 don't take that for an answer ;) 2008-10-02 16:48 flips : ill ask Eloxoph people (i was staff there once) 2008-10-02 16:48 got to have a better excuse than that for being lame 2008-10-02 16:48 its full of C programmers 2008-10-02 16:48 but they're already working on 2 projects 2008-10-02 16:48 sounds good 2008-10-02 16:48 but ill ask them anyway :P 2008-10-02 16:48 2 is not enough 2008-10-02 16:48 should be 3 2008-10-02 16:54 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-02 16:55 I'll be hanging on #freenode if any ubuntus want to chat, flipz, not on any channel 2008-10-02 16:55 heh 2008-10-02 16:55 FelipeS : hello 2008-10-02 16:55 orgthingy, hey 2008-10-02 16:55 so, finally got interested? 2008-10-02 16:56 friend of yours orgthingy? 2008-10-02 16:56 flips : from ##not-physics over there 2008-10-02 16:56 any dev with physics background tends to be interesting to me 2008-10-02 16:56 likewise music and math 2008-10-02 16:57 seem to be generally more aware design wise 2008-10-02 16:57 I'm just a young student. You got me there with the music background. I'm not into music at all. 2008-10-02 16:57 ok, let's see if the read extent generator works 2008-10-02 17:02 felipes, physics? 2008-10-02 17:02 or does #not-physics really mean not physics? 2008-10-02 17:02 flips : no, not-physics is offtopic channel of ##physics that i founded 2008-10-02 17:03 right, so it means "physics student" essentially? 2008-10-02 17:03 folks 2008-10-02 17:03 :P 2008-10-02 17:03 ok, physics prof then 2008-10-02 17:04 ACTION has to find his hotel receipts and stuff for reembursement 2008-10-02 17:04 mad scientist? 2008-10-02 17:04 flips : uuuumm? 2008-10-02 17:05 what is #physics about? 2008-10-02 17:05 flips : PHYSICS :P ? 2008-10-02 17:05 flips : how about going on topic with felipes? 2008-10-02 17:06 felipes, got c skillz? 2008-10-02 17:08 eh 2008-10-02 17:08 Honestly I'm in no position for being a dev or working on serious stuff 2008-10-02 17:08 at least I don't think so 2008-10-02 17:09 I just came here to maybe see some conversations on real work being done. 2008-10-02 17:09 sure, good place for that 2008-10-02 17:09 flips : he knows C++ most 2008-10-02 17:09 are you like Linus T. ? not allowing c++ code at all :P 2008-10-02 17:09 c++ has a certain influence over what we do 2008-10-02 17:10 I have nothing against c++, I would be ok with allow some files in kernel to be compiled that way, with appropriate care 2008-10-02 17:10 I'm a first year computer eng major at tech, programming is a hobby that got ahold of me about 4 years ago. I've never done anything impressing with it however. just learning here and there. 2008-10-02 17:10 but linus will not allow it so that ends that 2008-10-02 17:10 and it's all been with c++ 2008-10-02 17:11 computer eng is a good place to get a perspective on software efficiency 2008-10-02 17:11 c++ includes c, most of it 2008-10-02 17:11 yeah I know 2008-10-02 17:11 c++ lacks designated initializers, which we use extensivley, without them I'd be unwilling to do a kernel project in c++ even if it was ok with linus 2008-10-02 17:11 is it not 100% backwards compatible? 2008-10-02 17:11 'backwards' :P 2008-10-02 17:11 not 100% 2008-10-02 17:11 stupidly so 2008-10-02 17:15 "sure, good place for that"; was that sarcasm? flips 2008-10-02 17:16 not at all 2008-10-02 17:16 sarcasm always comes with a :p 2008-10-02 17:16 some of the best devs in the known universe hang here 2008-10-02 17:16 just have to catch them talking ;) 2008-10-02 17:18 oh well that's great. I'll be sure to add it to my favs then. 2008-10-02 17:18 filemap_extent_read: logical block 0x5 of inode 0x0 2008-10-02 17:18 ---- extent 0x5/1 ---- 2008-10-02 17:18 prior extents: 2008-10-02 17:18 ---- rewind to 0x0 => 0/1 ---- 2008-10-02 17:18 filemap_extent_read: index 5, limit 6 2008-10-02 17:19 filemap_extent_read: offset = 0, gap = 0 2008-10-02 17:19 filemap_extent_read: fill gap at 5/1 2008-10-02 17:19 balloc extent -> [2/1] 2008-10-02 17:19 segs (offset = 0): 5 => 2/1; (1) 2008-10-02 17:19 well things are starting to happen with extent read 2008-10-02 17:20 course it should not be allocating blocks on read 2008-10-02 17:20 that's because it started as a cut n paste of extent write 2008-10-02 17:20 needs to fill those buffers with zero instead 2008-10-02 17:20 hmm 2008-10-02 17:21 no, just the one buffer 2008-10-02 17:21 ...maybe 2008-10-02 17:21 I doubt "fill ahead" is a win 2008-10-02 17:29 getting closer... 2008-10-02 17:29 filemap_extent_read: logical block 0x5 of inode 0x0 2008-10-02 17:29 ---- extent 0x5/1 ---- 2008-10-02 17:29 prior extents: 2008-10-02 17:29 ---- rewind to 0x0 => 0/1 ---- 2008-10-02 17:29 filemap_extent_read: index 5, limit 6 2008-10-02 17:29 filemap_extent_read: offset = 0, next = 6, gap = 1 2008-10-02 17:29 filemap_extent_read: fill gap at 5/1 2008-10-02 17:29 balloc extent -> [2/1] 2008-10-02 17:29 segs (offset = 0): 5 => 2/1; (1) 2008-10-02 17:29 filemap_extent_read: extent 0x5/1 => 2 2008-10-02 17:29 filemap_extent_read: read block 0x5 => 2 2008-10-02 17:29 now need to get rid of that balloc/read and replace with fill for unmapped buffer 2008-10-02 18:08 seg[segs++] = *(struct extent *)(u64[]){ -1LL }; <- some nasty c 2008-10-02 18:08 thought I'd show that before throwing it away 2008-10-02 18:10 now does the right thing for unmapped buffers: 2008-10-02 18:10 filemap_extent_read: logical block 0x5 of inode 0x0 2008-10-02 18:10 ---- extent 0x5/1 ---- 2008-10-02 18:10 prior extents: 2008-10-02 18:10 ---- rewind to 0x0 => 0/1 ---- 2008-10-02 18:10 filemap_extent_read: index 5, limit 6 2008-10-02 18:10 filemap_extent_read: offset = 0, next = 6, gap = 1 2008-10-02 18:10 filemap_extent_read: fill gap at 5/1 2008-10-02 18:10 segs (offset = 0): 5 => ffffffffffff/1; (1) 2008-10-02 18:10 filemap_extent_read: extent 0x5/1 => ffffffffffff 2008-10-02 18:10 filemap_extent_read: read block 0x5 => ffffffffffff 2008-10-02 18:10 filemap_extent_read: zero fill buffer 2008-10-02 18:10 which brings us to sk8 oclock 2008-10-02 19:53 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-02 19:55 oh... no class today? 2008-10-02 20:01 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-02 20:02 aaa... 9pm tonight :P 2008-10-02 20:17 really? 9? 2008-10-02 20:20 I have to be up in about 6h... :( 2008-10-02 20:28 I'm falling asleep already... 2008-10-02 20:28 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-02 20:28 me too 2008-10-02 20:28 http://lxr.linux.no/linux+v2.6.26.5/fs/inode.c#L124 2008-10-02 20:29 about the first Q from the homework... 2008-10-02 20:29 what was the second one? 2008-10-02 20:31 aaa... the locks when a file is closed 2008-10-02 20:34 hmm 2008-10-02 20:36 I've already answered both. 2008-10-02 20:36 http://lxr.linux.no/linux+v2.6.26.5/fs/open.c#L1175 files->file_lock cannot be the lock we were talking about... 2008-10-02 20:36 let me check my logs 2008-10-02 20:37 ok, so let's pick it up again next thursday 2008-10-02 20:37 ok, first home work was why both ptr and struct address_space (*i_mapping and i_data) in struct inode 2008-10-02 20:37 http://lxr.linux.no/linux+v2.6.26.5/fs/locks.c#L1567 so the locks are tided to filp 2008-10-02 20:37 I will be offline for a few days 2008-10-02 20:37 ah 2008-10-02 20:37 it goes on with out me ;) 2008-10-02 20:37 (the ideal situation) 2008-10-02 20:38 :-) 2008-10-02 20:38 and the above L124 is not quite the answer 2008-10-02 20:38 ralucam, you following ok? 2008-10-02 20:38 MaZe: true, it's the point where are made the same... 2008-10-02 20:38 that's were it gets set to the default value, but why do you need a pointer, couldn't u always use &i_data 2008-10-02 20:38 the i_data doesn't seem to be used much 2008-10-02 20:39 inode->mapping is defined to be 2008-10-02 20:39 always valid 2008-10-02 20:39 re 1175 - not sure what you mean 2008-10-02 20:39 sorry... I was wrong about L124 2008-10-02 20:40 the locks in L1175 and L1567 are not quite the ones we were talking about re fget_light/fput_light 2008-10-02 20:40 so the thing from L124 is put in the structure here: http://lxr.linux.no/linux+v2.6.26.5/fs/inode.c#L184 2008-10-02 20:40 MaZe: aaaa... 2008-10-02 20:40 right 2008-10-02 20:40 but that's the default path 2008-10-02 20:41 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-02 20:41 now I'm searching for the places where i_mapping is changed :P 2008-10-02 20:41 the one which could have sed / "->i_mapping->" "->i_data." / 2008-10-02 20:41 killing lxr in the process ;-) 2008-10-02 20:42 maze, not quite 2008-10-02 20:42 hmm? 2008-10-02 20:42 i_mapping is a pointer, i_data is an object 2008-10-02 20:42 hence -> to . 2008-10-02 20:42 I'm searching for i_mapping only ;-) 2008-10-02 20:42 right 2008-10-02 20:43 I don't expect the name to be used for anything else 2008-10-02 20:43 sorry, my eyes got crossed with the sed ;) 2008-10-02 20:43 ok, searching logs for the second homework question 2008-10-02 20:43 there was a bonus about why fget_light is demented 2008-10-02 20:45 a: breaks refcounting purely to reduce cacheline pinging 2008-10-02 20:45 ah, ok, so that wasn't the bonus, that was just the second homework 2008-10-02 20:45 accurately characterized by akpm as "foul" 2008-10-02 20:45 http://lxr.linux.no/linux+v2.6.26.5/fs/block_dev.c#L290 2008-10-02 20:45 looks closer 2008-10-02 20:46 but no cigar 2008-10-02 20:46 that's the default value as well 2008-10-02 20:46 I still am not sure why we need i_mapping -> -i_data 2008-10-02 20:46 i_mapping = &i_data 2008-10-02 20:46 and http://lxr.linux.no/linux+v2.6.26.5/fs/block_dev.c#L425 2008-10-02 20:46 http://lxr.linux.no/linux+v2.6.26.5/fs/block_dev.c#L450 2008-10-02 20:46 to be more precise 2008-10-02 20:47 I suspect it's bogus, but a pretty extensive survey of usage would be required to say one way or the other 2008-10-02 20:47 that is the _main_ use case in the kernel 2008-10-02 20:47 (there are two others) 2008-10-02 20:47 right, coda and ? 2008-10-02 20:47 raw char devs 2008-10-02 20:47 char devs? 2008-10-02 20:48 why only char devs? 2008-10-02 20:48 the primary use case seems to be block devices, the secondary is raw char devices (ie. for direct io to block devs, ancient pre-O_DIRECT interface), and third/last is the coda fs 2008-10-02 20:48 well the blockdev usage smells really bogus 2008-10-02 20:48 -!- ajonat(~ajonat@190.48.123.108) has joined #tux3 2008-10-02 20:48 inode->i_mapping = &inode->i_data; 2008-10-02 20:48 from coda ;-) 2008-10-02 20:48 wrong line 2008-10-02 20:49 since that's restoring the default 2008-10-02 20:49 I pick it because of that ;-) 2008-10-02 20:49 the rest are in file.c 2008-10-02 20:50 jeez, that blockdev code is twisted 2008-10-02 20:50 I think, product of fuzzy thinking and not necessity 2008-10-02 20:50 but more analysis is needed to be sure 2008-10-02 20:51 there should be some system to mark as 'looks wrong to me' some stuff in the kernel :P 2008-10-02 20:51 heh 2008-10-02 20:51 we carve our comments in the internet, right hew 2008-10-02 20:51 right here 2008-10-02 20:51 ./fs/coda/file.c-106- host_inode = host_file->f_path.dentry->d_inode; 2008-10-02 20:51 ./fs/coda/file.c-107- coda_file->f_mapping = host_file->f_mapping; 2008-10-02 20:51 ./fs/coda/file.c:108: if (coda_inode->i_mapping == &coda_inode->i_data) 2008-10-02 20:51 ./fs/coda/file.c:109: coda_inode->i_mapping = host_inode->i_mapping; 2008-10-02 20:51 ./fs/coda/file.c-110- 2008-10-02 20:51 ./fs/coda/file.c-111- /* only allow additional mmaps as long as userspace isn't changing 2008-10-02 20:52 where they will be unearthed by archaeologists millenia later 2008-10-02 20:52 that's the relevant part of coda 2008-10-02 20:52 ./drivers/char/raw.c-76- filp->f_mapping = bdev->bd_inode->i_mapping; 2008-10-02 20:52 ./drivers/char/raw.c-77- if (++raw_devices[minor].inuse == 1) 2008-10-02 20:52 ./drivers/char/raw.c:78: filp->f_path.dentry->d_inode->i_mapping = 2008-10-02 20:52 ./drivers/char/raw.c-79- bdev->bd_inode->i_mapping; 2008-10-02 20:52 ./drivers/char/raw.c-80- filp->private_data = bdev; 2008-10-02 20:52 and that's for raw char dev 2008-10-02 20:53 bdev again 2008-10-02 20:53 while I can see/understand the need for the raw char dev and coda use cases 2008-10-02 20:53 I don't yet get what the normal bdev case is for 2008-10-02 20:53 actually... they looks similar 2008-10-02 20:53 coda and raw 2008-10-02 20:53 not surprising 2008-10-02 20:54 coda is a networked file system with local caching and offline operation 2008-10-02 20:54 the raw.c stuff looks bogus too 2008-10-02 20:54 at least party 2008-10-02 20:54 why dow d_inode need to have a mapping? 2008-10-02 20:54 it needs a way to tell the kernel that the page cache for a file in codafs is actually the page cache for another file in a local filesystem (ie. in the cache fs store) 2008-10-02 20:54 sorrry 2008-10-02 20:54 while the raw char dev case needs to remap the raw char dev to the block dev 2008-10-02 20:55 bd_acquire is only called in 3 places... 2008-10-02 20:55 lying underneath it 2008-10-02 20:55 whey does filp need a mapping, I meant to say 2008-10-02 20:55 maze, but that raw char dev case sounds like it could be done some other way 2008-10-02 20:56 hmm... how does cache-ing for char devices works? :P 2008-10-02 20:56 a maybe 2008-10-02 20:56 it's if we open the same block dev from different inodes? 2008-10-02 20:56 from different entries and/or filesystems? 2008-10-02 20:56 something wierd 2008-10-02 20:56 that offends my sense of form and balance 2008-10-02 20:56 since they're all actually the same block device, but they're not the same inode 2008-10-02 20:56 that makes sense 2008-10-02 20:56 want to have them backed by the same page cache 2008-10-02 20:57 make sense! :D 2008-10-02 20:57 how about coda? 2008-10-02 20:57 we don't open block devices "from inodes" 2008-10-02 20:57 we open them from names 2008-10-02 20:57 sure we do ;-) 2008-10-02 20:57 we're talking about i_node->i_mapping remember ;-) 2008-10-02 20:57 right name -> dentry -> inode -> bla bla -> mapping 2008-10-02 20:57 $ stat hda1 2008-10-02 20:57 File: `hda1' 2008-10-02 20:57 Size: 0 Blocks: 0 IO Block: 4096 block special file 2008-10-02 20:57 Device: dh/13d Inode: 2343 Links: 1 Device type: 3,1 2008-10-02 20:57 Access: (0660/brw-rw----) Uid: ( 0/ root) Gid: ( 6/ disk) 2008-10-02 20:57 Access: 2008-10-01 15:41:08.151619201 -0400 2008-10-02 20:57 Modify: 2008-10-01 15:40:54.344130563 -0400 2008-10-02 20:58 Change: 2008-10-01 15:40:54.344130563 -0400 2008-10-02 20:59 stat hdaX 2008-10-02 20:59 File: `hdaX' 2008-10-02 20:59 Size: 0 Blocks: 0 IO Block: 4096 block special file 2008-10-02 20:59 Device: dh/13d Inode: 911348 Links: 1 Device type: 3,1 2008-10-02 20:59 Access: (0644/brw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) 2008-10-02 20:59 MaZe is right 2008-10-02 20:59 # ls -al sda 2008-10-02 20:59 brw-rw---- 1 root disk 8, 0 2008-09-29 17:52 sda# mknod sda__ b 8 0 2008-10-02 20:59 # ls -al sda__ 2008-10-02 20:59 brw-r--r-- 1 root root 8, 0 2008-10-02 20:58 sda__ 2008-10-02 20:59 # stat sda 2008-10-02 20:59 Device: eh/14d Inode: 339 Links: 1 Device type: 8,0 2008-10-02 20:59 # stat sda__ 2008-10-02 20:59 Device: eh/14d Inode: 844038 Links: 1 Device type: 8,0 2008-10-02 20:59 :D 2008-10-02 20:59 I was first ;-) 2008-10-02 20:59 yup 2008-10-02 21:00 and it'll have an even different inode on a different partition/filesystem 2008-10-02 21:00 hence, different sb,inode pairs referring to the same block dev, but having to have the same page cache 2008-10-02 21:00 that for sure 2008-10-02 21:00 I don't understand the page cache for char dev though 2008-10-02 21:01 the raw char dev is an abomination 2008-10-02 21:01 it behave like a block dev 2008-10-02 21:01 nowadays the correct way is to open the normal (base) block dev with option O_DIRECT 2008-10-02 21:01 maze, but why does kernel ever have to use the inode from the mknod file directly? 2008-10-02 21:01 oh... 2008-10-02 21:01 why not dereference to the real inode first? 2008-10-02 21:01 flips: dentry? 2008-10-02 21:02 could be a dentry reason, still sounds bogus 2008-10-02 21:02 real inode? 2008-10-02 21:02 even if it's in the dentry 2008-10-02 21:02 it used to be you would map the base dev to a raw dev and then get direct io on the raw dev, which for reasons which escape me wasn't a block dev, but a char dev instead 2008-10-02 21:02 dreference after getting hold of the dentry 2008-10-02 21:02 because bdevs don't have inodes? 2008-10-02 21:02 "if" it's a dentry reason 2008-10-02 21:02 the names can be different 2008-10-02 21:02 so the entries are diferent 2008-10-02 21:03 there is no real name, no real inode, the only thing there is is the block major, minor pair 2008-10-02 21:03 you want to alias them to the same inode? 2008-10-02 21:03 my sense that this extra level of indirection in ->mapping is being abused by blockdevs is getting stronger 2008-10-02 21:03 you could probably get away with having a blockdevfs 2008-10-02 21:03 which would be the fs with inodes for blockdevs, which you could then return instead of the proper inode from the normal filesystem 2008-10-02 21:03 except 2008-10-02 21:04 inodes also store permissions 2008-10-02 21:04 which might be different between different blockdev entries on the filesystem 2008-10-02 21:04 :D 2008-10-02 21:04 well let's put that one aside 2008-10-02 21:04 return to it later 2008-10-02 21:04 and you couldn't delete them... 2008-10-02 21:04 since you'd be deleting the wrong inode 2008-10-02 21:04 after I've had time to read more around that part of the code ;) 2008-10-02 21:05 so the inode references the entry in the filesystem 2008-10-02 21:05 it's often a mistake to assume that something which is really weird is that way because it has to be 2008-10-02 21:05 [hmm although deletions, really delete dentries, not inodes] 2008-10-02 21:05 dirent, not dentry 2008-10-02 21:06 we reserve the latter name to mean the cache item 2008-10-02 21:06 right nomenclature... 2008-10-02 21:06 just convention 2008-10-02 21:06 I have a feeling you'd still need it for stuff like coda, or more advanced caching network fs'es anyway 2008-10-02 21:07 _maybe_ 2008-10-02 21:07 why don't you write a stacking filesystem and see if you're forced to use that feature? 2008-10-02 21:07 ordinary filesystem is too easy for you ;) 2008-10-02 21:08 in truth, I've taken a light run at that myself and found the issues... hurtful 2008-10-02 21:08 our vfs was not designed to be stacked 2008-10-02 21:08 it's very much a fixed number of levels kind of thing 2008-10-02 21:09 every now and then some people show up to improve the stackability, and after pushing uphill for a while they go away again 2008-10-02 21:10 then when fuse came along, most of the motivation for stacking filesystems in kernel went away 2008-10-02 21:10 leaving just nfs... a quasi stacking filesystem... and coda... I think I have very little understanding of 2008-10-02 21:10 for a while there was intermezzo 2008-10-02 21:10 which never quite got working 2008-10-02 21:11 well I don't think it was an official tux3 u session today 2008-10-02 21:11 let's not post logs 2008-10-02 21:12 will resume in earnest next thursday 2008-10-02 21:12 with the friends, right? :D 2008-10-02 21:13 that would be a good place, or any requests? 2008-10-02 21:13 what is the biggest mystery remaining? 2008-10-02 21:13 dentry cache ranks up there 2008-10-02 21:13 path walk 2008-10-02 21:14 ->rename, worth a visit 2008-10-02 21:14 pdflush? 2008-10-02 21:14 it's really vm, but you need to know how it works to write fast filesystem code 2008-10-02 21:15 ACTION will still 'float' till 22 :| 2008-10-02 21:15 hmm, it may even not be generic enough for cool stuff like cow or versioned or other types of opts filesystems 2008-10-02 21:16 what is pdflush? 2008-10-02 21:16 maze, what might not be? 2008-10-02 21:16 pdflush? certainly isn't 2008-10-02 21:16 the current i_mapping pointer 2008-10-02 21:16 ah 2008-10-02 21:16 since you could potentially have 2008-10-02 21:16 yes, you could get into some really demented versioning tricks 2008-10-02 21:16 2 files on different inodes on different filesystems 2008-10-02 21:17 but we will use a very simple one... 2008-10-02 21:17 one version of a file gets loaded into the inode->mapping and that is it 2008-10-02 21:17 and the page cache could potentially be shared between them if they (or parts of them) refer to the same data 2008-10-02 21:17 and shared with the block dev cache that the files are stored on , etc 2008-10-02 21:17 we don't try to share mapping pages between different versions of the same file 2008-10-02 21:17 ACTION is off to bed. Good night to everyone! 2008-10-02 21:17 sharing mapping pages would require deep surgery 2008-10-02 21:17 see you 2008-10-02 21:17 yep, just what I realized 2008-10-02 21:17 good night! 2008-10-02 21:18 save that for linux 2.9 2008-10-02 21:18 but sharing pages between the blockdev and the fs on top of it, and the netfs exported/imported from it, and the various versions and cow files is what should happen 2008-10-02 21:19 why between various versions of cow files? 2008-10-02 21:19 if it lives in one spot on disk, it should only live in one spot in memory 2008-10-02 21:19 why does that matter? 2008-10-02 21:19 for the regions which are identical 2008-10-02 21:19 memory 2008-10-02 21:19 so we waste some cache by duplicating pages, what's the problem? 2008-10-02 21:19 you can get by with a much smaller cache, or make much better use of existing cache 2008-10-02 21:20 we already suck beyond belief for in-cache diff 2008-10-02 21:20 hmm? what do you mean/ 2008-10-02 21:20 maybe fix the obvious breakage first 2008-10-02 21:20 ? 2008-10-02 21:20 try diffing two kernel trees 2008-10-02 21:20 and see how much memory you need to keep both 100% in cache 2008-10-02 21:20 it's ballooned from before, not because the tree got bigger 2008-10-02 21:21 [and yes doing all this is hard because of writes and read-only stuff, and when to dupe, when to modify in place, etc] 2008-10-02 21:21 $ time diff -qr linux-2.6.26.5_ linux-2.6.26.5 2008-10-02 21:21 real 0m1.208s 2008-10-02 21:22 hmm 2008-10-02 21:22 but how much cache am I using 2008-10-02 21:22 on, now suppose you want to share cache pages, that means at find_cache_page miss time you need to be able to know the target page is already in some other cache 2008-10-02 21:22 the way to do that is by putting a forwarding pointer in the page cache for the device 2008-10-02 21:22 Cached: 3193700 kB 2008-10-02 21:22 hmm wonder how much of that was the 2 kernel trees? 2008-10-02 21:23 mazem, 4 GB machine? 2008-10-02 21:23 yup 2008-10-02 21:23 try it with 512 MB 2008-10-02 21:23 $ du -hs * 2008-10-02 21:23 323M linux-2.6.26.5 2008-10-02 21:23 323M linux-2.6.26.5_ 2008-10-02 21:23 won't work 2008-10-02 21:23 since the kernels take 640M themselves 2008-10-02 21:24 1G then 2008-10-02 21:24 Cached: 2552060 kB 2008-10-02 21:24 after deleting both kernel trees 2008-10-02 21:24 would it have gc'ed all the stuff I deleted? 2008-10-02 21:25 it should not have 2008-10-02 21:25 um 2008-10-02 21:25 no, of course it should have 2008-10-02 21:25 deleted = not cacheable 2008-10-02 21:26 so it seems to have deleted exactly the right amount -> 641MB 2008-10-02 21:27 oh, so the block dev and the file system mounted on top of it have seperate page caches, which aren't shared till they hit disk 2008-10-02 21:27 which is why you should never touch the block dev directly, if there's an fs mounted on it 2008-10-02 21:27 not quite 2008-10-02 21:28 (except potentially for reads) 2008-10-02 21:28 they aren't shared, but the set of blocks in the two is disjoint 2008-10-02 21:28 better be 2008-10-02 21:28 uhm 2008-10-02 21:28 I don't think there's any guarantee for that 2008-10-02 21:28 file cache is data, blockdev cache is metadata 2008-10-02 21:28 the filesystem has to make that gaurantee 2008-10-02 21:28 oh, you mean if the fs is using the blockdev 2008-10-02 21:29 sure 2008-10-02 21:29 the fs always uses the blockdev 2008-10-02 21:29 well 2008-10-02 21:29 if you do it from userspace you can easily screw that up 2008-10-02 21:29 it always uses the buffer cache 2008-10-02 21:30 you will see code in filesystems to invalidate buffer cache pages when metadata is freed 2008-10-02 21:30 ah 2008-10-02 21:30 in case they later get used for normal data 2008-10-02 21:30 in some caches, clean alias pages are left around, but that is an accident waiting to happen 2008-10-02 21:30 in some cases I meant 2008-10-02 21:30 yes 2008-10-02 21:31 classic badness 2008-10-02 21:31 has bitten many times 2008-10-02 21:31 in some cases it's impossible to avoid aliases 2008-10-02 21:31 namely when one block on a page is data and another is metadata 2008-10-02 21:32 the blocks themselves are not aliased but the pages are 2008-10-02 21:32 right 2008-10-02 21:33 right, this entire page cache system is nice and simple, and has good performance, but can't really represent all the edge cases, or more complex scenarios 2008-10-02 21:33 it's pretty simple minded true 2008-10-02 21:34 so how would you go about answering the question: in which page cache(s) is a given physical block already mapped? 2008-10-02 21:34 if you could change everything? 2008-10-02 21:35 I'm not sure yet, but it's pretty clear that (if possible to get good performance with something like this) we would want the minimum amount of duplication possible 2008-10-02 21:36 ie. if it's in one physical location on disk, it should remain in one (or zero) pages in ram regardless of how many levels it crosses 2008-10-02 21:36 all the way down 2008-10-02 21:36 block dev, virtual block dev, file system, network fs, userspace mmap, possibly userspace read hack opts 2008-10-02 21:37 specifying exactly what happens when you trigger a write to a page, would be non-trivial 2008-10-02 21:38 in some cases you simply allocate a new page with duped data that is not mapped to anything else (a modify in ram only scenario) 2008-10-02 21:38 in others you'd need to allocate space on the filesystem and map to a 'sync to this location on disk' page 2008-10-02 21:38 etc 2008-10-02 21:39 my main worry would be that we could potentially be triggering spurious context switches on writes to read only pages 2008-10-02 21:39 spurious - wrong word - more like 'an excessive number of' that would hurt performance 2008-10-02 21:40 you need to be able to throw pages around 2008-10-02 21:41 a read from block dev through virtual dev (lvm) to fs, would somehow result in the same page being held from all 3 places 2008-10-02 21:41 you're going to have a lot of trouble when data and metadata live on the same page 2008-10-02 21:41 and then a full page write would result in an existing page getting mapped in to all 3 places at the same time (reverse process), while partial writes would flag pages as dirty etc 2008-10-02 21:42 yes, but I'm not sure metadata and data deserve to be treated seperately 2008-10-02 21:42 you don't need context switch when you write to a read only page, you can check the page flags explicitly 2008-10-02 21:42 and not take a fault 2008-10-02 21:42 in kernel space sure 2008-10-02 21:42 not so in userspae 2008-10-02 21:42 file data is certainly treated separately from metadata 2008-10-02 21:42 not going to change soon 2008-10-02 21:42 in kernel space I could simply check the counters 2008-10-02 21:43 in memory? inodes dentries, etc, sure 2008-10-02 21:43 but on disk? 2008-10-02 21:43 not so sure 2008-10-02 21:43 tux3 already kind of has less of a distinction than normal 2008-10-02 21:43 between? 2008-10-02 21:44 between a file and it's contents, and metadata (logs, btrees, etc) 2008-10-02 21:44 oh, some metadata is mapped as data 2008-10-02 21:44 I think it should be possible to have all the metadata on disk behave like data, with possible exception of (a few?) superblocks 2008-10-02 21:44 I'm sure we're going to hit some interesting recursions in there at some point 2008-10-02 21:45 sometimes I try to map file index metadata into a page cache and it never seems to work out very well 2008-10-02 21:45 well imagine we have a filesystem 2008-10-02 21:45 it already works 2008-10-02 21:46 now we need to store metadata 2008-10-02 21:46 so we write it out to a logfile in the first filesystem 2008-10-02 21:46 metadata != xattrs 2008-10-02 21:46 and store trees as sparse files in the first filesystem, etc 2008-10-02 21:46 sure 2008-10-02 21:46 that's actually done already 2008-10-02 21:46 now if the first filesystem is the filesystem for which we're storing metadata 2008-10-02 21:46 in lustre 2008-10-02 21:46 we've got a problem... 2008-10-02 21:47 since we get updates on updates 2008-10-02 21:47 however 2008-10-02 21:47 if all we update is up to a specific point 2008-10-02 21:47 and the rest is handled via forward logging 2008-10-02 21:47 ok, so you could make it work, but what is the win? 2008-10-02 21:47 then so long as you can guarantee that generating X KB of updates generates less than X KB of new updates, then it converges 2008-10-02 21:48 I think it should actually turn out to be pretty simple 2008-10-02 21:48 code wise 2008-10-02 21:48 if not conceptually 2008-10-02 21:48 so which piece of tux3 would go into a file next? 2008-10-02 21:48 no idea 2008-10-02 21:48 the inode table is problematic because of variable sized inodes 2008-10-02 21:48 does not map into a page cache nicely 2008-10-02 21:48 I'm still at the phase, where I'm thinking this should be doable 2008-10-02 21:49 and since we rely on logging during mount anyway, it never has to fully converge 2008-10-02 21:49 otherwise, filesystems with fixed sized inodes could put the inode table in page cache and it would be a win 2008-10-02 21:49 the file system is always dirty 2008-10-02 21:50 tux3 only has three kinds of things that are not in files: 1) inode table 2) file indexes 3) update logs 2008-10-02 21:50 ie. it's always: what's on disk reflects last commit point + forward log which contains the changes which were made to the fs to perform the last commit (and any other changes from userspace in the mean time) 2008-10-02 21:50 ok, so why ain't the update log in a file? 2008-10-02 21:50 I can't see winning on any of those three by mapping to a file 2008-10-02 21:50 ah 2008-10-02 21:51 don't know ;) 2008-10-02 21:51 recursion for one thing 2008-10-02 21:51 log the updates to the log file 2008-10-02 21:51 see the forward log should just be a periodically front-truncated normal file 2008-10-02 21:51 and the win is? 2008-10-02 21:51 the win is all we have to support is normal files 2008-10-02 21:52 got to be more of a win than that 2008-10-02 21:52 to make up for the extra problems 2008-10-02 21:52 and except from the initial 'recover during mount' phase it's simple 2008-10-02 21:52 you share code for more stuff, you don't have to (at least theoretically) special case allocation for the forward log, etc 2008-10-02 21:53 since we have not done the log at all yet, if you come up with a convincing win argument, we can do it that way 2008-10-02 21:53 although that might be a bad thing 2008-10-02 21:53 sharing code is a minor plus 2008-10-02 21:53 -!- Kirantpatil(~kiran@122.167.195.107) has joined #tux3 2008-10-02 21:53 ACTION is looking for the big win 2008-10-02 21:53 dinner time 2008-10-02 21:53 ACTION thinks this would be the first file system which would deserve the name 2008-10-02 21:53 when is the next burst of activity on junkfs, or is there nothing interesting left to try? 2008-10-02 21:54 lots of interesting stuff 2008-10-02 21:54 it's also end-of-quarter time 2008-10-02 21:54 that was last week 2008-10-02 21:54 working on-and-off on options 2008-10-02 21:54 one would wish it was done last week ;-) 2008-10-02 21:54 we're scoring on monday, so I want to finish two more things I've left before then 2008-10-02 21:58 -!- Kirantpatil(~kiran@122.167.195.107) has left #tux3 2008-10-02 22:21 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-10-02 22:43 ok, time to finish up the extent drop 2008-10-02 22:43 make it the default 2008-10-02 22:43 start exposing bugs 2008-10-02 23:08 flips: how was the sk8 today 2008-10-02 23:09 was a fine skate in the dark 2008-10-02 23:09 started at sunset 2008-10-02 23:09 just down to the sk8 park and back? 2008-10-02 23:09 up to 3rd st 2008-10-02 23:09 ah 2008-10-02 23:10 musicians on the strand were doing special things 2008-10-02 23:10 "funk you we're playing what we want" 2008-10-02 23:10 i got out at 3pm on the road bike 2008-10-02 23:10 rode up tuna canyon for the first time in months 2008-10-02 23:11 tuna is like entering a different zone all together 2008-10-02 23:11 you get 100 yards off the pch, and you're in the wilderness 2008-10-02 23:11 sounds nice