2008-09-12 00:00 as did I 2008-09-12 00:03 good to do this stuff in public 2008-09-12 00:03 keeps a useful record, and it's a subtle way of poking at the libxattr guys to fix their packages 2008-09-12 00:03 ACTION wonders who the libxattr guys are 2008-09-12 00:04 choose one: 1) redhat 2) suse 2008-09-12 00:05 http://oss.sgi.com/projects/xfs/ 2008-09-12 00:05 possible 2008-09-12 00:06 $ rpm -qf `which getfattr ` 2008-09-12 00:06 attr-2.4.41-1.fc9.x86_64 2008-09-12 00:06 it was a suse guy who put xattr+acl support into ext3 2008-09-12 00:06 $ rpm -qi attr | grep URL 2008-09-12 00:06 URL : http://oss.sgi.com/projects/xfs/ 2008-09-12 00:06 ah, ok 2008-09-12 00:06 -!- stargazr5(~gauravstt@59.95.35.250) has joined #tux3 2008-09-12 00:07 well all the slashdotters seem to agree that the new lucasarts game lacks in the gameplay department 2008-09-12 00:07 it's fun though 2008-09-12 00:08 for a few minutes 2008-09-12 00:08 throwing big chunks of metal at little robots 2008-09-12 00:08 making lightning come out of your fingers and not do much 2008-09-12 00:10 what's it called? 2008-09-12 00:11 force unleashed 2008-09-12 00:11 you play as a sith apprentice 2008-09-12 00:11 darth's personal waterboy 2008-09-12 00:14 flips: tried KOTOR? 2008-09-12 00:14 both parts are good 2008-09-12 00:14 enjoyed kotor 2008-09-12 00:15 force unleashed better ? 2008-09-12 00:16 but I don't want to play any more bioware games 2008-09-12 00:16 too cookie cutter 2008-09-12 00:16 even jade empire felt like kotor 2008-09-12 00:16 and I got spoiled by oblivion 2008-09-12 00:16 everything you couldn't do in kotor,you could do in oblivion 2008-09-12 00:16 nope. haven't played either 2008-09-12 00:16 yeah. oblivion was great. 2008-09-12 00:17 force unleashed is mildly entertaining 2008-09-12 00:17 they got some mechanics right, others badly wrong 2008-09-12 00:17 and it's way linear 2008-09-12 00:18 I'll probably get force unleashed 2008-09-12 00:18 but I don't expect much 2008-09-12 00:18 just filling time until bethesda comes up with something new ;-) 2008-09-12 00:18 :) 2008-09-12 00:19 thanks for ur support to the de-duplication idea 2008-09-12 00:19 will get back to you when we have something more concrete 2008-09-12 00:19 welcome, so that's you 2008-09-12 00:20 sure 2008-09-12 00:20 deduplication seems to be a big hot button 2008-09-12 00:20 yeah...me, stargazr5 , kd and another 2008-09-12 00:20 for new fs design 2008-09-12 00:20 ah 2008-09-12 00:20 were you here for the vfs tour today? 2008-09-12 00:21 no....missed it... :( but following it now. 2008-09-12 00:21 good 2008-09-12 00:21 how many years of C has each of you got? 2008-09-12 00:22 look through the logs - there's bound to be some jewels in there 2008-09-12 00:23 about 2 and half years 2008-09-12 00:23 MaZe: thanks. will do. 2008-09-12 00:24 and have you done an OS and/or FS course yet? 2008-09-12 00:24 I think this is for an advanced fs course, right? 2008-09-12 00:24 ACTION reads again 2008-09-12 00:25 yes. OS. 2008-09-12 00:26 how much low level experience do you have? C / assembly interfaces? While it's not really needed, it comes in useful from time to time. 2008-09-12 00:27 [of course, IMHO, assembly is always useful to know... so I may be biased] 2008-09-12 00:27 I agree 2008-09-12 00:27 not as a first language 2008-09-12 00:27 but certainly as one of the first 5 2008-09-12 00:28 it's been years now since I've written any 2008-09-12 00:28 if I do write some, it's likely to be for some strange arch like cell spe 2008-09-12 00:28 oh, I've never written true assembly... it's always been inline, often entire procedures, but never entire programs (unless the entire program was a hundred lines or less) 2008-09-12 00:29 ah, I've written tens of thousands of lines 2008-09-12 00:29 wallowed in it 2008-09-12 00:29 have a bit of experience in assembly...not much.. 2008-09-12 00:29 oh, I've written tens of thousands of lines, never all in one piece though 2008-09-12 00:29 got really good at it, then realized there's people much, much better 2008-09-12 00:30 this is part of project that we can do on any cs topic 2008-09-12 00:30 right 2008-09-12 00:30 The biggest pieces I've written were usually either asm-coded bigint adders and the like, or something like a boot sector 2008-09-12 00:30 just reread your post 2008-09-12 00:30 I've done stuff llike transcoded knuth's algorithms for infinite precision math from MIX and x86 2008-09-12 00:31 neither of which has a lot of code, but either has huge performance boosts from assembly, or just needs to be in asm 2008-09-12 00:31 making the carries work out is hard ;-) 2008-09-12 00:31 MIX to x86 I mean 2008-09-12 00:31 oh, but that sort of stuff is still not pure asm, you write that in C with good macros and inline asm 2008-09-12 00:31 not me 2008-09-12 00:31 pure 2008-09-12 00:31 there is almost never a reason to write pure asm 2008-09-12 00:32 sure, when the OS isn't linux 2008-09-12 00:32 and the compiler isn't gcc 2008-09-12 00:32 gcc asm syntax blows by the way ;-) 2008-09-12 00:32 true, if compiler != gcc, then hang yourself 2008-09-12 00:32 blows as in bad? or in good? 2008-09-12 00:32 bad 2008-09-12 00:32 sucks chunks 2008-09-12 00:32 the syntax is bad, but it is extremely powerful 2008-09-12 00:33 although it takes quite some getting used to 2008-09-12 00:33 I know, so why not have the syntax be good and be extremely powerful? 2008-09-12 00:33 hehe 2008-09-12 00:33 yeah, well... that's like 2008-09-12 00:33 C 2008-09-12 00:33 the syntax for C also blows 2008-09-12 00:33 kinda yes 2008-09-12 00:33 at&t asm blows orders worse 2008-09-12 00:33 I think that's where it came from 2008-09-12 00:34 well 2008-09-12 00:34 let's not scare the visitors ;-) 2008-09-12 00:34 cranky old hacks 2008-09-12 00:35 a (*b(c d, e (*f)(g)))(h); 2008-09-12 00:35 what kind of syntax is that? 2008-09-12 00:35 next tux3 university? 2008-09-12 00:36 tue at 8 pm pacific 2008-09-12 00:36 tuesday 8 pm 2008-09-12 00:36 right, I tend to forget that timezone 2008-09-12 00:36 will be there this time. 2008-09-12 00:36 ok 2008-09-12 00:36 like the world revolves around silly valley 2008-09-12 00:36 :) 2008-09-12 00:36 :) 2008-09-12 00:36 doesn't it? 2008-09-12 00:36 not sure 2008-09-12 00:36 I didn't use to think so ;) 2008-09-12 00:37 I have some stories about silicon valley and time zones that I can't share ;-( 2008-09-12 00:37 nice to know there's a reason to get you drunk 2008-09-12 00:38 speaking of which 2008-09-12 00:38 a sake would help get this post written 2008-09-12 00:38 or should I just hack 2008-09-12 00:38 hmm 2008-09-12 00:38 I mean, with the sake in hand of course 2008-09-12 00:39 [btw, that declaration above, that's valid C - that's the declaration of signal from the std C library, where a,e=void b=signal c,g,h=int d=sig f=func 2008-09-12 00:39 taking abt timezones...its luch time here...i am off.. 2008-09-12 00:40 wow 2008-09-12 00:40 me too 2008-09-12 00:41 see you cdk, stargazr5 2008-09-12 00:42 you know I can read that without thinking? 2008-09-12 00:42 that's scary 2008-09-12 00:42 also used to to hex multiply/divide in my head at the most geekiest 2008-09-12 00:42 still can do it, more slowly 2008-09-12 00:50 the C syntax above? without thinking? really? that is scary 2008-09-12 00:50 that is like the ugliest part of C... 2008-09-12 00:51 arguably 2008-09-12 00:51 personally, I think const is 2008-09-12 00:51 dreamed up by a sadist 2008-09-12 00:51 oh, const is relatively simple to parse though 2008-09-12 00:51 especially if you write it so 'char const *' instead of 'const char *' 2008-09-12 00:51 but a devilishly effective makework project 2008-09-12 00:52 and then read from the back, since C is mostly read from the back/center anyway 2008-09-12 00:52 bottom up reading is a powerful organizing force 2008-09-12 00:52 I so much prefer Pascal syntax for type definitions 2008-09-12 00:52 you just read it left to right 2008-09-12 00:52 me too 2008-09-12 00:53 but pascal as a whole is just plain irritating 2008-09-12 00:53 if the things a pointer it says so in the first character 2008-09-12 00:53 I am also offended by == 2008-09-12 00:53 irritating? a little long-winded, true, but so frickin' easy to understand 2008-09-12 00:53 and friends 2008-09-12 00:53 yes, I prefer the pascal := and = as opposed to = and == 2008-09-12 00:54 I don't mind != nor <> - doesn't really matter to me 2008-09-12 00:54 the most pleasant language I've worked in is pick basic 2008-09-12 00:54 as modified by a friend of mine 2008-09-12 00:54 not aware of that - does that differ from basic? 2008-09-12 00:54 most of the stupidities gone 2008-09-12 00:54 and missing stuffyou need in, like structuring primitives 2008-09-12 00:54 [are you aware modern oo pascal like delphi, like freepascal, is 32bit and has object oriented programming, operator overloading, function overloading, etc...] 2008-09-12 00:55 it's very different from basic 2008-09-12 00:55 totally not anything like msft basic 2008-09-12 00:55 h 2008-09-12 00:55 ehrm, ah 2008-09-12 00:55 I haven't used the latest borland stuff, no 2008-09-12 00:55 msft liked it enough to headhunt the guy as I recall 2008-09-12 00:55 and we got c# :p 2008-09-12 00:56 another flavor of C-that-blows 2008-09-12 00:56 C# is just a flavour of java 2008-09-12 00:56 with the worst of C added in 2008-09-12 00:58 I want a melding of pascal (type declaration syntax, ease of reading code), Java (generics, interfaces [extended]), gnu-ism (inline asm power), C (low level control), C++ (some of the OO, dropping multiple inheritance), not sure what to do with some things [exceptions] 2008-09-12 00:59 let me know when you have code to try 2008-09-12 00:59 make it a very strongly typed language, drop most of the legacy crap, support useful UI candy (type in constants in any base, etc...) 2008-09-12 01:00 don't forget to make it interactive 2008-09-12 01:00 and managed 2008-09-12 01:00 and semicolons optional 2008-09-12 01:00 being able to recompile program blocks and replace them on the fly - yes I've wanted that ;-) 2008-09-12 01:00 managed? 2008-09-12 01:00 likewise parens, including fn call parens 2008-09-12 01:00 a function needs parens 2008-09-12 01:00 managed = can't segfault 2008-09-12 01:00 a procedure doesn't 2008-09-12 01:01 what does that mean? 2008-09-12 01:01 can't segfault? 2008-09-12 01:01 yes. 2008-09-12 01:01 lisp can't segfault in principle 2008-09-12 01:01 java can't either 2008-09-12 01:01 oh, but then it's not low-level 2008-09-12 01:01 I'm really concerned as to why 65% of my memory is in use (not including cache) 2008-09-12 01:01 in principle. 2008-09-12 01:01 konrad, in what context? 2008-09-12 01:01 the above would be a language you could write a kernel in 2008-09-12 01:02 konrad, running mozilla? 2008-09-12 01:02 oh 2008-09-12 01:02 flips: yeah, but that's only eating 600M or something 2008-09-12 01:02 I have 6 gigs 2008-09-12 01:02 I've got 45% in programs on a 4 g machine 2008-09-12 01:02 ncie 2008-09-12 01:02 nice 2008-09-12 01:02 50% cache 2008-09-12 01:02 yeah. I'm concerned. 2008-09-12 01:02 need to get yourself a memory map 2008-09-12 01:02 from proc 2008-09-12 01:02 there must be a tool 2008-09-12 01:03 (and in my case that probably is a gig and a half of firefox3) 2008-09-12 01:03 wicked 2008-09-12 01:03 13457 maze 20 0 103m 12m 9.9m S 0.0 0.3 0:29.40 gnome-power-man 2008-09-12 01:04 103M! 2008-09-12 01:04 3409 root 20 0 1454m 1.1g 28m S 5.6 18.0 531:18.61 Xorg 2008-09-12 01:04 just say gno to gnome 2008-09-12 01:04 but that's still insubstantial relative to 6 2008-09-12 01:05 kmail's using 500M 2008-09-12 01:05 what do you get from cat /proc/meminfo? 2008-09-12 01:05 pastie maybe? 2008-09-12 01:06 http://pastie.caboo.se/271078 2008-09-12 01:10 .6 gig of buffers, woof 2008-09-12 01:11 A gig into swap 2008-09-12 01:11 that's braindamage 2008-09-12 01:11 something is using 2.7 gig of straight memory 2008-09-12 01:11 should not be hard to find 2008-09-12 01:11 well 2008-09-12 01:11 X leaks often 2008-09-12 01:11 if you don't see it in the processes then yes, that is worrisome 2008-09-12 01:12 but you'd see the X usage even when it's leaking 2008-09-12 01:12 you know Shift-M with top? 2008-09-12 01:12 I think that's the one 2008-09-12 01:12 shows your proces in rss order 2008-09-12 01:12 or is it total vm size 2008-09-12 01:12 one of those 2008-09-12 01:12 vm size I think 2008-09-12 01:13 in order, Xorg, firefox, nautilus, gnome-panel, kmail, pidgin, gnome-terminal 2008-09-12 01:13 and some others 2008-09-12 01:13 VmallocTotal: 34359738367 kB 2008-09-12 01:13 that has to be broken 2008-09-12 01:14 VmallocChunk: 34359675895 kB 2008-09-12 01:14 haven't spent much time crawling in vm lately 2008-09-12 01:14 you might want to go onto #mm on this server 2008-09-12 01:15 and complain about that vmalloctotal 2008-09-12 01:15 it's late enough that I can't be arsed 2008-09-12 01:15 works for me 2008-09-12 01:15 alright, I got back a significant chunk of it by ditching firefox 2008-09-12 01:16 down to 51% used 2008-09-12 01:16 still 2008-09-12 01:17 check your anon 2008-09-12 01:17 I don't think firefox was using that 2008-09-12 01:18 600M of it went away after closing firefox 2008-09-12 01:18 leaving 2.1 gig in anon? 2008-09-12 01:18 that's broken 2008-09-12 01:18 yes 2008-09-12 01:19 check top 2008-09-12 01:19 for what? 2008-09-12 01:19 shift-M 2008-09-12 01:19 look at virtual size 2008-09-12 01:19 and rss 2008-09-12 01:19 fes 2008-09-12 01:19 res 2008-09-12 01:20 xorg has 1471m virt 1.1g res, nautilus 767m virt 212m res, kmail 536m virt 146m res 2008-09-12 01:20 top 3 2008-09-12 01:20 fsking pigs 2008-09-12 01:20 wtf is X doing? 2008-09-12 01:21 nautilus... 2008-09-12 01:21 :p 2008-09-12 01:21 shh :) 2008-09-12 01:21 I'd say you' 2008-09-12 01:21 I'd say you've got a simple case of out of control X plus 4 x bloatware 2008-09-12 01:21 sounds about right 2008-09-12 01:22 I'd call the X part a bug 2008-09-12 01:22 the other is just sloth 2008-09-12 01:22 it leaks like crazy 2008-09-12 01:22 write nasty emails to xorg 2008-09-12 01:22 tell them you want your money back 2008-09-12 01:23 kernel is unsing an unconscionable amount of buffers 2008-09-12 01:23 metadata is supposed to be small. Buffers -> metadata 2008-09-12 01:23 you can go complain on #mm 2008-09-12 01:23 tell peterz to do something ;-) 2008-09-12 01:23 that might be a result of encrypted harddrive 2008-09-12 01:24 probably 2008-09-12 01:24 well 2008-09-12 01:24 dodgy encryption layer 2008-09-12 01:24 right 2008-09-12 01:25 what's the encryption method, craptoloop or dm-crapt? 2008-09-12 01:25 dm-crypt 2008-09-12 01:25 complain on the dm-devel list 2008-09-12 01:25 there 2008-09-12 01:25 got it all sorted ;-) 2008-09-12 01:25 :) 2008-09-12 01:49 incremental refcount block update cost <- I love english 2008-09-12 01:49 don't you maze? 2008-09-12 01:51 all that stuff in the refcount post was actually designed during the skate today 2008-09-12 01:51 the same skate the resulting in "rollerbladers are allowed" inthe sk8board park 2008-09-12 01:52 so it was a good skate, all things considered 2008-09-12 01:52 g'night 2008-09-12 01:57 "Meanwhile, Rockbox has performed a valuable service for Debian developers who would otherwise have to struggle to find a project with longer release cycles than their own. " hah 2008-09-12 01:59 :-) 2008-09-12 01:59 what is rockbox? 2008-09-12 02:03 different firmware for your ipod 2008-09-12 02:03 or other 'mp3 player'-class devices 2008-09-12 02:04 (with additional functionality in mind) 2008-09-12 02:09 ACTION talks about b-tree parallelization with flips 2008-09-12 02:09 I was thinking about how something like RCU could be integrated into a b-tree. I don't know the specifics of a b-tree per se other than it's a tree that's flatter and better suited for storage 2008-09-12 02:11 flips: so I was thinking about per inode processing if we decided to parallelize it on that bassis 2008-09-12 02:11 we don't have rcu in userspace 2008-09-12 02:11 but we need locking in userspace 2008-09-12 02:11 also: rcu has some scary artifacts 2008-09-12 02:11 file locking would have to be done on a per inode basis 2008-09-12 02:12 I think it's a case of, do some decent spinlock + mutex work, then try rcu 2008-09-12 02:12 not rcu first 2008-09-12 02:12 rcu is wierd when it goes wrong 2008-09-12 02:12 so that code would to have to be split up in that manner if you do it that way 2008-09-12 02:12 yeah, I know 2008-09-12 02:12 so are spinlocks etc, but the weirdness is a lot easier to grasp 2008-09-12 02:12 you have to validae the data and stuff before using it 2008-09-12 02:12 making sure that it's not stale 2008-09-12 02:12 per inode is too coarse 2008-09-12 02:13 across quiescence periods 2008-09-12 02:13 well, what then ? 2008-09-12 02:13 simple spinlocks and mutexes 2008-09-12 02:13 decide how many 2008-09-12 02:13 what order acquired 2008-09-12 02:13 for how long 2008-09-12 02:13 what granularity of data protected 2008-09-12 02:14 estimate contention 2008-09-12 02:14 decide where rw is appropriate 2008-09-12 02:14 where trylock works 2008-09-12 02:14 then think about per cpu 2008-09-12 02:14 problem here is that rwlocks suck badly since they still depend on an atomic operation, limited scalability which is why some kind of per CPU-ism is useful 2008-09-12 02:14 at first, pretend bouncing costs nothing 2008-09-12 02:15 ok 2008-09-12 02:15 when it's working reliably and bouncing starts to show in the profile (because everything else is so fast) then take anti-bounce measures 2008-09-12 02:15 let think, how do we want to break this up ? 2008-09-12 02:15 what are we protecting and what circumstances ? 2008-09-12 02:16 there should be any number of concurrent readers and writes allowed in one file inode at the same time 2008-09-12 02:16 same with the inode table 2008-09-12 02:16 they will be partitioned by subtree 2008-09-12 02:16 higher levels of the tree can have rwlocks 2008-09-12 02:17 I think 2008-09-12 02:17 low level, simple mutex or spinlock is better 2008-09-12 02:17 what parts of the subtree ? 2008-09-12 02:17 =and how are they realted to the inode itself ? 2008-09-12 02:17 and how are they related to the inode itself ? 2008-09-12 02:17 a file index tree descends from the inode table 2008-09-12 02:18 think 50 petabyte file 2008-09-12 02:18 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-12 02:18 well, it depends on what you're guarding, we have to define the relationship first 2008-09-12 02:18 to get the sense of how many read/writes can be in it at the same time 2008-09-12 02:18 let's start from the beginning 2008-09-12 02:18 what happens on a file open ? 2008-09-12 02:18 guarding changes to the index leaf nodes, which is to say, the block pointers, and later extents 2008-09-12 02:18 and define the common operations 2008-09-12 02:18 open, read, write, close 2008-09-12 02:19 on file open we first look in the directoy file 2008-09-12 02:19 find the inode number 2008-09-12 02:19 then probe into the inode table 2008-09-12 02:19 that's a flat file, right ? 2008-09-12 02:19 find the inode table block, and the inode in it 2008-09-12 02:19 the directory? 2008-09-12 02:19 currently flat 2008-09-12 02:19 diretory file 2008-09-12 02:19 directory file 2008-09-12 02:19 later will have a btree mapped into the flat file 2008-09-12 02:19 has its own locking considerations 2008-09-12 02:19 I'm assuming that's it's a specific inode on the file system 2008-09-12 02:20 what is? 2008-09-12 02:20 we have two structures so far right ? 2008-09-12 02:20 1) directory map file 2008-09-12 02:20 2) b-tree 2008-09-12 02:20 not quite like that 2008-09-12 02:20 tux3 is a two level btree structure 2008-09-12 02:20 top level btree is the inode table 2008-09-12 02:21 from the inode table descend some large number of file index btrees 2008-09-12 02:21 a directory is the leaves of one of those btrees 2008-09-12 02:21 that is, the data blocks 2008-09-12 02:21 the leaves of a file index btree actually contain pointers to data blocks 2008-09-12 02:22 so we go probing around in some directory, taking the same locks as we would for any file 2008-09-12 02:22 ACTION reads 2008-09-12 02:22 that is, locking various levels of the index btree of the directory file 2008-09-12 02:23 once we find a data block we read it into the page cache and drop our locks 2008-09-12 02:23 maybe not all of them, maybe just up to some level 2008-09-12 02:23 well 2008-09-12 02:24 that is a little tricky, because the linux generic_file_read etc functions don't work that way 2008-09-12 02:24 they generally cause the filesystem to walk its index tree over and over again, for each block 2008-09-12 02:24 sucks 2008-09-12 02:24 ACTION is a bit confused 2008-09-12 02:24 ACTION thinks 2008-09-12 02:24 we don't need to worry about that 2008-09-12 02:25 for the moment we only need to be able to dive down into the btree and find a pointer to some data block 2008-09-12 02:25 see inode.c 2008-09-12 02:25 "filemap_blockio" 2008-09-12 02:26 most of the work is done by "probe" 2008-09-12 02:26 probe is where most of the locking action will happen 2008-09-12 02:27 so a file is a b-tree ? 2008-09-12 02:27 which is lower in level to the inode b-tree ? 2008-09-12 02:27 that's the relationship ? correct ? 2008-09-12 02:27 http://tux3.org/tux3?f=6ea2692d2839;file=user/test/btree.c 2008-09-12 02:27 see probe in there 2008-09-12 02:27 a file is _indexed_ by a btree 2008-09-12 02:27 that's in the lower level right ? 2008-09-12 02:28 ACTION looks 2008-09-12 02:28 a file lives in data blocks, that are pointed to by pointers that live in the leaves of a btree, called a data index btree 2008-09-12 02:28 the leavesof that btree are called dleaves 2008-09-12 02:28 see dleaf.c 2008-09-12 02:29 the situation with dleaf.c is pretty simple 2008-09-12 02:29 we can protext an entire dleaf as one logical entitity 2008-09-12 02:30 that covers about 500 file data blocks 2008-09-12 02:30 which is an ok granularity 2008-09-12 02:30 top level b-tree right ? 2008-09-12 02:30 what top level btree? 2008-09-12 02:30 a dtree is a second level btree 2008-09-12 02:30 you have an inode b-tree and a data index b-tree, correct ? 2008-09-12 02:30 the top level btree is the inode table 2008-09-12 02:30 right 2008-09-12 02:30 I'm just trying to understand the terminology here 2008-09-12 02:30 ok, good 2008-09-12 02:30 that's what I though 2008-09-12 02:30 thought 2008-09-12 02:30 itree vs dtree 2008-09-12 02:31 good 2008-09-12 02:31 yeah, good terminology 2008-09-12 02:31 thanks 2008-09-12 02:31 itree->dtree 2008-09-12 02:31 right 2008-09-12 02:31 terminology is important 2008-09-12 02:31 agreed 2008-09-12 02:31 it's what I figured you said in the first place, but I had to be sure 2008-09-12 02:31 right, protect a dtree entirely with a lock 2008-09-12 02:31 http://kerneltrap.org/Linux/Tux3_Hierarchical_Structure 2008-09-12 02:32 some of this is wrong now 2008-09-12 02:32 ACTION reads 2008-09-12 02:32 dropped the volume table, moved the free map inside the itree 2008-09-12 02:32 update it before the next tux3 university 2008-09-12 02:32 as a normal file 2008-09-12 02:32 that's hard 2008-09-12 02:32 that's on somebody else's site 2008-09-12 02:32 but I can post something on tux3.org 2008-09-12 02:32 ACTION really appreciates the help in learning this from flips 2008-09-12 02:33 do you have an allocation maps that's shared 2008-09-12 02:33 ? 2008-09-12 02:33 at least you can see the inode table / data index table relationship there 2008-09-12 02:33 but it is obscured by the volume table, which I determined to be useless 2008-09-12 02:33 that's a potentially huge problem for contention with regards to the allocator 2008-09-12 02:33 allocation map? 2008-09-12 02:34 there is an allocation bitmpa 2008-09-12 02:34 block allocation map 2008-09-12 02:34 which is a normal file 2008-09-12 02:34 well, how do you modify it, say, under heavy delete or data creation pressure ? 2008-09-12 02:34 sb->bitmap in inode.c 2008-09-12 02:34 doesn't it need a lock around it ? 2008-09-12 02:34 currently there is no locking 2008-09-12 02:34 or concurrency 2008-09-12 02:34 soon 2008-09-12 02:35 well, doesn't it need it ? 2008-09-12 02:35 but it is just a normal file 2008-09-12 02:35 lock it with the same granularity 2008-09-12 02:35 there's a lot of activity there so I expect it to be heavily hit 2008-09-12 02:35 sure 2008-09-12 02:35 same granularity as what ? 2008-09-12 02:35 other files too 2008-09-12 02:35 but! 2008-09-12 02:35 there is a difference with tux3 2008-09-12 02:35 ok 2008-09-12 02:35 tux3 has this way of logging changes to the bitmaps 2008-09-12 02:35 it doesn't have to lock, write block, wait 2008-09-12 02:35 ok 2008-09-12 02:35 that kind of thing 2008-09-12 02:35 oh nice 2008-09-12 02:36 so locks on the bitmap are just page cache locks 2008-09-12 02:36 deltas to the allocation map are just appended 2008-09-12 02:36 that is, most like actually locking pages when we get to kernel 2008-09-12 02:36 or we could lock buffers 2008-09-12 02:36 locking pages is a little faster 2008-09-12 02:36 what about during concurrent access against an online checker that needs to know about all of the appended logs ? 2008-09-12 02:36 yes, deltas to the allocation map are just logged 2008-09-12 02:37 and every now and then we pour a bunch of them into the allocation map and write it out 2008-09-12 02:37 differ those checks until the log has been commit to the disk and then restart it ? 2008-09-12 02:37 committed 2008-09-12 02:37 the allocation map always has the most recent version of the allcoation 2008-09-12 02:37 in buffers 2008-09-12 02:37 in memory 2008-09-12 02:37 because, say, you want to verify if data blocks that some indirect mapping is pointing is allocated or not 2008-09-12 02:37 so an online check, ah, needs to check the disk image 2008-09-12 02:37 not the cached image 2008-09-12 02:37 right? 2008-09-12 02:38 pretty hard to do otheriwse 2008-09-12 02:38 anway, that's not the immediate problem 2008-09-12 02:38 the immediate problem is just tohave fast, concurrent access to everything 2008-09-12 02:38 if it's not and a log is being committed, we should delay it until that log has been committed ? 2008-09-12 02:38 just thinking out loud 2008-09-12 02:38 if what is not? 2008-09-12 02:38 what ? the log ? 2008-09-12 02:39 the log itself 2008-09-12 02:39 "if it's not"you said 2008-09-12 02:39 don't know what "it" is 2008-09-12 02:39 is there a scenario where the online checking of that portion of the disk and ...on that'll never happen 2008-09-12 02:39 because of the atomic commit 2008-09-12 02:39 it's should be consistent at that point from previous commits 2008-09-12 02:40 we don't really need to check logs that are being committed to disk and wait for them to complete 2008-09-12 02:40 or do we ? 2008-09-12 02:40 we do 2008-09-12 02:40 because the logs form a promise of what the "real" disk image "should" look like 2008-09-12 02:40 yeah, well then we have to lock them down or something like that 2008-09-12 02:40 so we need to take it into account during checking 2008-09-12 02:40 but checking is far in the future 2008-09-12 02:41 under, say a rwlock lock, reader side 2008-09-12 02:41 at least 3 months 2008-09-12 02:41 probably 4 2008-09-12 02:41 ok 2008-09-12 02:41 worth thinking about 2008-09-12 02:41 let's do beer on it 2008-09-12 02:41 I'll think about it on my next skate, if refcounting is done ;-) 2008-09-12 02:42 going back to the bitmap 2008-09-12 02:42 so, at least each bit has to be protected 2008-09-12 02:42 we do scan, find, change 2008-09-12 02:42 and that scan/find/change to allocate a block has to be under a spinlock 2008-09-12 02:43 flips: am I being helpful or not ? 2008-09-12 02:43 or in userspace, under a pthread mutex 2008-09-12 02:43 or saying stupid irrelevant things ? just checking 2008-09-12 02:43 of course 2008-09-12 02:43 ok 2008-09-12 02:43 I haven't been required to be precise about this up till now 2008-09-12 02:43 or deal with somebody who had written nontrivial locking 2008-09-12 02:43 we'll I hope I'm helping 2008-09-12 02:43 yep 2008-09-12 02:43 anyway, the allocation bitmap is a good place to start 2008-09-12 02:44 because there is a pretty simple situation there 2008-09-12 02:44 I think you can definitely isolate a dtree using an individual dtree lock 2008-09-12 02:44 once you know your bitmap block isn't going away 2008-09-12 02:44 well, one lock per dtree is way too crude 2008-09-12 02:44 that's good, but you have to think about the upward relationship between than than itree 2008-09-12 02:44 actually you don't 2008-09-12 02:44 you can treat it as individual blocks 2008-09-12 02:44 is the lock against the dtree sufficient to protect the link in the itree pointing to it ? 2008-09-12 02:45 stuff like that 2008-09-12 02:45 you lock your way down through the btree until you get to the datablock, lock the data block and let everything else go 2008-09-12 02:45 ok, let's define what a read would look like through that. 2008-09-12 02:45 ok, right 2008-09-12 02:45 you look up the inode in the itree 2008-09-12 02:45 you get it 2008-09-12 02:45 what next ? 2008-09-12 02:45 it points to a dtree 2008-09-12 02:45 well that's a good point 2008-09-12 02:45 so you want to delete a data block 2008-09-12 02:45 that means you have to lock the data block 2008-09-12 02:46 so I think that's clear, right? 2008-09-12 02:46 yes 2008-09-12 02:46 that means, the read has to be off it 2008-09-12 02:46 reader 2008-09-12 02:46 what do you lock then ? the dtree or some part of the dtree ? 2008-09-12 02:46 you lock the block 2008-09-12 02:46 how would region locking look like ? 2008-09-12 02:46 region locking looks like locking a subtree node 2008-09-12 02:46 then you have to wait for _every_ other lock to go away 2008-09-12 02:47 not a good idea 2008-09-12 02:47 why do you want to lock a region? 2008-09-12 02:48 posix semantics or something like that 2008-09-12 02:48 totally different locking level 2008-09-12 02:48 can't you lock a range in the file under posix ? 2008-09-12 02:48 waaay up in the vfs 2008-09-12 02:48 layered, independent 2008-09-12 02:49 also, the linux posix locking code blows 2008-09-12 02:49 coarse grained as hell 2008-09-12 02:49 true, but you can bypass it 2008-09-12 02:49 single fucking lock 2008-09-12 02:49 yup, and a linear list 2008-09-12 02:49 blows 2008-09-12 02:49 but you still do it at the same level 2008-09-12 02:49 that's not our concern now 2008-09-12 02:50 or maybe I'm just stuck in the wrong mindset 2008-09-12 02:50 could be 2008-09-12 02:50 anyway, at least we can let that suck exaclty as it always has 2008-09-12 02:50 we won't lose a benchmark showdown for that reason 2008-09-12 02:50 who uses posix locks anyway? ;) 2008-09-12 02:52 there is a case where you want to lock a region 2008-09-12 02:52 cluster fs 2008-09-12 02:52 but that's not us 2008-09-12 02:52 yet 2008-09-12 02:53 ok 2008-09-12 02:53 let's continue 2008-09-12 02:53 how do you lock the data block ? 2008-09-12 02:53 I guess for tux3 we can think of a single block as our unit of locking 2008-09-12 02:53 this is for a read remember... 2008-09-12 02:53 in kernel? 2008-09-12 02:53 take the block lock 2008-09-12 02:53 it's a bitspin lock as I recall 2008-09-12 02:54 same with the page lock 2008-09-12 02:54 it's fast enough for this purpose 2008-09-12 02:54 we'll its all something to think about 2008-09-12 02:54 in userspace 2008-09-12 02:54 pthread mutex, we will put one in each buffer 2008-09-12 02:54 that's pretty nasty 2008-09-12 02:54 so you lock the mutex in the buffer 2008-09-12 02:54 because? 2008-09-12 02:54 how big of a file chunk are we deleting ? 2008-09-12 02:55 ah, delete 2008-09-12 02:55 well, nice thing about truncate is, we don't have to wait for it 2008-09-12 02:55 we can just mark the inode as "truncated" and we're done 2008-09-12 02:56 we don't even have to update the inode 2008-09-12 02:56 just promise to in our log 2008-09-12 02:56 or do we need to lock during the read as well ? 2008-09-12 02:56 and take our sweek time, walking through the dtree, taking locks, freeing blocks 2008-09-12 02:56 on a block basis 2008-09-12 02:56 we need to lock on read, yes 2008-09-12 02:56 on a block basis 2008-09-12 02:56 which ? what does the lock hierarchy look like 2008-09-12 02:56 ? 2008-09-12 02:57 just long enough to enter the block into the cache 2008-09-12 02:57 do we lock the itree ? dtree ? what ? 2008-09-12 02:57 we work our way down the levels of the two trees, taking locks and releasing them 2008-09-12 02:57 we only hold a lock long enough to know that we can see the next object in cache 2008-09-12 02:57 do we take reader locks along the way ? 2008-09-12 02:57 if we don't see it in cache, drop everything, read it in, start over fromthe top 2008-09-12 02:58 simple mined algorithm 2008-09-12 02:58 a starting point 2008-09-12 02:58 and hold them across the entire operation ? 2008-09-12 02:58 let's definte this 2008-09-12 02:58 only across the operation of finding the next level down in the cache 2008-09-12 02:58 soon as we find that, we lock it, release the parent 2008-09-12 02:58 be specific 2008-09-12 02:58 make sense? 2008-09-12 02:58 I thought that was specific 2008-09-12 02:59 hold the itree lock until we get the specific dtree ? 2008-09-12 02:59 there is no itree lock 2008-09-12 02:59 no, more specific :) 2008-09-12 02:59 ok, lock the root of the itree 2008-09-12 02:59 ok 2008-09-12 02:59 that is, look for it in cache 2008-09-12 02:59 ok 2008-09-12 02:59 if it's not there, issue a read 2008-09-12 02:59 block until it is 2008-09-12 02:59 to load the portion of the itree, right ? 2008-09-12 02:59 then block until we own the read lock 2008-09-12 02:59 have a read lock 2008-09-12 03:00 nope 2008-09-12 03:00 wait... 2008-09-12 03:00 to probe down where we want to go 2008-09-12 03:00 let me summarize 2008-09-12 03:00 yes, that stops everybody 2008-09-12 03:00 probing the itree 2008-09-12 03:00 well 2008-09-12 03:00 it stops writers 2008-09-12 03:00 yes 2008-09-12 03:00 because we have a read lock on the root 2008-09-12 03:00 right 2008-09-12 03:00 we aren't going to keep it long 2008-09-12 03:00 that would be unfriendly 2008-09-12 03:00 yes 2008-09-12 03:01 so what we do is, we find the next index block down inthe inode table index tree 2008-09-12 03:01 check its in cache 2008-09-12 03:01 if so, take a read lock on it 2008-09-12 03:01 if not, drop the root lock, issue a read, block on it, start again at the root 2008-09-12 03:01 obviously this may never terminate ;-) 2008-09-12 03:02 but we have other problems if it doesn't 2008-09-12 03:02 so we look up a inode; reader lock the itree; if it's not there issue a read to load that in, release all of the above locks until that block's wait queue wakes; read that block, get that dtree link 2008-09-12 03:02 while holding the itree reader lock 2008-09-12 03:02 we don't read lock the itree 2008-09-12 03:02 correct ? 2008-09-12 03:02 we read lock the root of the itree 2008-09-12 03:02 big difference 2008-09-12 03:02 ok 2008-09-12 03:02 so let's say the itree has seven levels of index 2008-09-12 03:02 big itree 2008-09-12 03:02 what's the difference ? 2008-09-12 03:02 ok 2008-09-12 03:03 we start by locking the root 2008-09-12 03:03 then we lock level one index, and drop the root lock 2008-09-12 03:03 then lock level 2 index block, and drop the level 1 2008-09-12 03:03 and so on 2008-09-12 03:03 down to level 7 2008-09-12 03:03 then we start the same process onthe dtree 2008-09-12 03:03 make sense? 2008-09-12 03:03 or propagate downwards, releasing a lock 2008-09-12 03:03 kind of scary 2008-09-12 03:03 ok, it's not that scary 2008-09-12 03:04 big reason: we wil normally keep hitting the same inode table block several times 2008-09-12 03:04 so we keep a "cursor" 2008-09-12 03:04 right, advance the cursor as needed 2008-09-12 03:04 what about rebalancing operations ? how does this effect it ? 2008-09-12 03:04 got to worry about how cursors interact with write locks on the itree 2008-09-12 03:04 but then 2008-09-12 03:04 that's why we're talking about it 2008-09-12 03:05 I don't know how to manipulate it other than with a big coarse grained lock as this time 2008-09-12 03:05 ok when you want to rebalance, delete, insert, split, whatever, you need a write lock 2008-09-12 03:05 on the parent and on the blocks being changed 2008-09-12 03:05 so you do the same thing 2008-09-12 03:05 I simply don't know enough about b-trees to know how to downward propagate the lock 2008-09-12 03:05 cursor 2008-09-12 03:05 me neither 2008-09-12 03:05 haven't done this before 2008-09-12 03:06 it's jsut brainwork though 2008-09-12 03:06 no magic 2008-09-12 03:06 ok, that's a big deal 2008-09-12 03:06 the expert on tree locking that I know of is peterz 2008-09-12 03:06 what kind of tree? 2008-09-12 03:06 he's done all sorts of shit 2008-09-12 03:06 radix tree and other things 2008-09-12 03:06 I'll ping him 2008-09-12 03:06 higly concurrent trees 2008-09-12 03:06 radix tree is pretty simple 2008-09-12 03:06 highly concurrent trees 2008-09-12 03:06 compared to a filesystem index 2008-09-12 03:07 he's the best person for the job that I know of 2008-09-12 03:07 yes, I'll point peterz at it 2008-09-12 03:07 you're not bad 2008-09-12 03:07 you're asking the right questions 2008-09-12 03:07 he might not have time, but I don't think that your current track of fine graining the system upfront is the best solution 2008-09-12 03:07 ah 2008-09-12 03:08 we have another knob we can tweak 2008-09-12 03:08 you should consider seriously per cpu-ing it if possible or faking it userspace 2008-09-12 03:08 there is also a refcount on each buffer 2008-09-12 03:08 we can get highly concurrent reads with RCU, that's a given 2008-09-12 03:08 per-cpuing it before figuring out how to do it with normal locks would not be wise 2008-09-12 03:08 1) walk 2) run 2008-09-12 03:08 it's just a matter of how we can modify it to apply to your current atomic log at that time 2008-09-12 03:09 ok, see that recount comment 2008-09-12 03:09 maybe the use of an atomic counter would help to version the logs for both RCU tree nodes and the atomic disk log 2008-09-12 03:09 very important 2008-09-12 03:09 forget rcu 2008-09-12 03:09 rcu is braindamage 2008-09-12 03:10 when we want it rcu'd, we'll hand it to the rcu guys 2008-09-12 03:10 it can be but it's also a bad ass algorithm 2008-09-12 03:10 it's a real use of time 2008-09-12 03:10 depends on the kind of guarantees you need 2008-09-12 03:10 not consistent with getting a solid prototype up 2008-09-12 03:10 ok 2008-09-12 03:10 I guess basic thread safety is first 2008-09-12 03:10 anyway, the question, what happens when the itree geometry needs to change 2008-09-12 03:11 so we have all these readers walking down the tree, that's great 2008-09-12 03:11 and they release their locks as they go so somebody can come behind and maybe go down a different subtree 2008-09-12 03:11 very nice already 2008-09-12 03:11 but 2008-09-12 03:11 how do you change the tree geometry? 2008-09-12 03:11 well 2008-09-12 03:12 you can actuall do it when there are readers buzzing away inside subtrees that you're moving around 2008-09-12 03:12 that is cool 2008-09-12 03:12 you just need to write lock the parent and read lock the children, so you know that the tasks ahead of you have gotten off the parent 2008-09-12 03:13 then you can change the parent 2008-09-12 03:13 make sense? 2008-09-12 03:13 you can for example, split the parent 2008-09-12 03:13 and then you may have to change the parent's parent 2008-09-12 03:13 well 2008-09-12 03:13 fun 2008-09-12 03:14 ACTION reads 2008-09-12 03:14 you might have to check the path to find out how high up the splits will go, and get write locks on that whole chain 2008-09-12 03:14 I was just talking to peterz 2008-09-12 03:14 asked him the same questions we were talking about here 2008-09-12 03:14 he's not going to be around much since he's headed to plumber's 2008-09-12 03:14 the comparison of answers must be fascinating 2008-09-12 03:15 I think that itree node deletion needs to be tied to file handle semantics somehow 2008-09-12 03:15 tell him good luck with the plumbing, there is a lot of shit in those pipes 2008-09-12 03:15 na, forget that 2008-09-12 03:15 good 2008-09-12 03:15 it's way wrong ;) 2008-09-12 03:16 peterz used a lock in a linked list node to protect link modification 2008-09-12 03:16 so think about why your deleting an itree node 2008-09-12 03:16 yeah, you're completely removing it 2008-09-12 03:16 but why? 2008-09-12 03:16 it's not really what you want 2008-09-12 03:17 I'm agreening with you 2008-09-12 03:17 it's because you're coalescing the itree, and why are you doing that? 2008-09-12 03:17 agreeing 2008-09-12 03:17 I know 2008-09-12 03:17 I'm doing rhetoric 2008-09-12 03:17 ok 2008-09-12 03:17 so you're doing that because you've just delete masses of files and you want to tighten up the inode table tree a little 2008-09-12 03:17 actually, this is quite optional 2008-09-12 03:18 we don't really need to do that 2008-09-12 03:18 particularly if we tend to reuse the same inode numbers in the not too distant future 2008-09-12 03:18 it's only if we are determined to use completely different ones, for no good reason, that we need to fiddle the geometry of the itree on delete 2008-09-12 03:19 anway 2008-09-12 03:19 let's assume that we do want to be tidy and coalesce the itree frequently, even if we are not required to 2008-09-12 03:20 that means merging nodes in general 2008-09-12 03:20 just what I was talking about 2008-09-12 03:20 well 2008-09-12 03:20 no, I was thinking of splitting above 2008-09-12 03:20 merging is more forgiving as far as locking the access path goes 2008-09-12 03:20 tough problem 2008-09-12 03:21 I'd go with something simple first 2008-09-12 03:21 not really, you only need to write lock the parent and the two blocks being merged 2008-09-12 03:21 this is too much for a prototype 2008-09-12 03:21 but the first thing to do is to define specifically how a coarse grained set of locks would work on it 2008-09-12 03:21 it is traditional to start with a single global lock on each btree 2008-09-12 03:21 and then propagate downwards 2008-09-12 03:21 and find out how badly that sucks 2008-09-12 03:21 right 2008-09-12 03:21 we know in advance it sucks too much 2008-09-12 03:21 so why bother? 2008-09-12 03:21 we have lockstat so we can get a good idea of how sucky it is and it will be sucky 2008-09-12 03:21 maybe in next week's prototype 2008-09-12 03:21 then that's it 2008-09-12 03:22 ok 2008-09-12 03:22 the biggest focus at this time is to get your prototype working fully 2008-09-12 03:22 true 2008-09-12 03:22 with concurrency 2008-09-12 03:22 in user space 2008-09-12 03:22 had a slight change of philosophy there 2008-09-12 03:22 the best thing we can do is make provisions to do fine grained locking or per cpu-ification in the future easily 2008-09-12 03:22 when the fuse stuff landed 2008-09-12 03:22 not solve the entire problem upfront 2008-09-12 03:22 right 2008-09-12 03:22 well 2008-09-12 03:23 I don't think you mean by per-cpu what I mean 2008-09-12 03:23 but let's ask interesting questions and get help from folks like peterz 2008-09-12 03:23 per-cpu to me means replicating the relevant date per-cpu 2008-09-12 03:23 that's a big mess 2008-09-12 03:23 last resort 2008-09-12 03:23 all that 2008-09-12 03:24 no, 2008-09-12 03:24 it's about avoiding locking in the first place during an inode operation 2008-09-12 03:24 like how? 2008-09-12 03:24 as much locking as possible under that operation 2008-09-12 03:25 like making the entire read path as per cpu as possible 2008-09-12 03:25 vfs is going to take i_sem, can't tell it not to 2008-09-12 03:25 we have to stick to the part we own 2008-09-12 03:25 yeah 2008-09-12 03:25 and are responsible for 2008-09-12 03:25 which is our indexing structures 2008-09-12 03:25 so we're already taking an inode lock of some sort 2008-09-12 03:25 different inode 2008-09-12 03:26 there's the struct inode, which the vfs locks, and the image of the inode on an inode table block, which we lock 2008-09-12 03:41 sleeping time 2008-09-12 03:46 ok 2008-09-12 03:46 night 2008-09-12 03:46 later flips sleep well 2008-09-12 03:46 ACTION is going to be up still 2008-09-12 03:47 night 2008-09-12 08:44 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-12 09:10 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-12 09:26 -!- eli(~elicriffi@66.249.86.209) has joined #tux3 2008-09-12 10:24 -!- kd(kdpict@118.94.54.179) has joined #tux3 2008-09-12 11:15 -!- kushal(kdpict@118.94.54.179) has joined #tux3 2008-09-12 11:39 -!- pgquiles(~pgquiles@229.Red-83-49-101.dynamicIP.rima-tde.net) has joined #tux3 2008-09-12 13:05 flips: 2008-09-12 13:05 05:09 < peterz> bh: not too hard - I send you a paper on that iirc 2008-09-12 13:05 05:09 < giel> not too hard implementation-wise, or time/space wise? 2008-09-12 13:05 05:10 < giel> complexity theory! 2008-09-12 13:05 05:10 < peterz> implementation wise :-) 2008-09-12 13:05 05:10 < peterz> the btree space/time considerations don't change 2008-09-12 13:05 05:11 < peterz> the thing that's hardest about the fine grain locking is the optimistic locking approach 2008-09-12 13:05 05:11 < peterz> you'd have to work out where upwards traversal stops on your way down 2008-09-12 13:05 which channel? 2008-09-12 13:05 he's exactly right 2008-09-12 13:06 woke up thinking about precisely that 2008-09-12 13:06 this morning 2008-09-12 13:10 #offtopic2 2008-09-12 13:10 but he's traveling right now 2008-09-12 13:11 to KS and Plumbers 2008-09-12 13:11 it's not offtopic ;-) 2008-09-12 13:11 I'll invite peterz here 2008-09-12 13:11 better than #offtopic 2008-09-12 13:12 KS is going to be buzzing about tux3 ;-) 2008-09-12 13:12 lots of trash talking from the trash talkers 2008-09-12 13:13 KS has degenerated to mostly wanking 2008-09-12 13:13 very little tech gets done there any more 2008-09-12 13:13 just climbers getting fact time 2008-09-12 13:13 are they ? 2008-09-12 13:13 regarding buzz ? 2008-09-12 13:13 ? 2008-09-12 13:14 face time I mean 2008-09-12 13:14 oh are they a bunch of wankers now ? this is a publically logged channel keep in mind :) 2008-09-12 13:14 ah right 2008-09-12 13:14 well that's a public comment 2008-09-12 13:14 ACTION giggles 2008-09-12 13:14 not trying to make friends, eh ? :) 2008-09-12 13:14 never have gotten along well with wankers 2008-09-12 13:15 just me 2008-09-12 13:15 yeah, well, I can do with less political wanking and more changes into the kernel 2008-09-12 13:15 yep 2008-09-12 13:15 opensolaris helps focus on that 2008-09-12 13:15 not enough yet 2008-09-12 13:16 specificallly a couple of things I've been planning for years but never had the time to really do 2008-09-12 13:16 linux is losing "customers" to opensolaris 2008-09-12 13:16 it's a fact 2008-09-12 13:16 oh really ? 2008-09-12 13:16 not desktoppers, but datacenters 2008-09-12 13:16 backrooms 2008-09-12 13:16 the guys with money 2008-09-12 13:17 flips: have peterz resend you the paper, I don't know where it is for at the moment 2008-09-12 13:17 paper? 2008-09-12 13:18 the paper on the topic regarding trees and locking 2008-09-12 13:18 would be nice 2008-09-12 13:18 ask him for it 2008-09-12 13:18 sure 2008-09-12 13:19 we have to do some minor changes to btree.c I think 2008-09-12 13:19 because it currently climbs the path when it has to split 2008-09-12 13:19 it has to descend instead, and it has to drop locks before doing that 2008-09-12 13:20 so it may find that somebody else has changed the object is was looking at when it gets back down 2008-09-12 13:20 the object can't be deleted fortunately 2008-09-12 13:20 because its caller must hold a reference 2008-09-12 13:20 it'll have to be locked in-roder 2008-09-12 13:20 in-order 2008-09-12 13:20 on the cache image 2008-09-12 13:20 what ever that is 2008-09-12 13:20 so that is the rule: you have to hold a ref on the cached object before you can delete the disk object 2008-09-12 13:21 and the ref count of the cached object must be equal to one 2008-09-12 13:21 good example of unwritten lore about the kernel 2008-09-12 13:21 books don't tell you that 2008-09-12 13:21 but anybody who is allowed to touch core vfs knows that 2008-09-12 13:21 fs hackers have to know it do, and often don't 2008-09-12 13:21 know it too that is 2008-09-12 13:22 locking order for a btree is simple 2008-09-12 13:22 root-to-leaf 2008-09-12 13:22 left-to-right if that granularity matters, which it doesn't 2008-09-12 13:22 so just root-to-leaf 2008-09-12 13:23 but resize_btree goes leaf-to-root, doesn't work 2008-09-12 13:31 ok, time to stop cleaning up and write some refcounting code 2008-09-12 13:31 no comments back on my post last night 2008-09-12 13:31 I thought folks would chew on that 2008-09-12 13:32 it's really core to tux3 performance in general 2008-09-12 13:32 not just atom refcounting 2008-09-12 13:33 hi. i just copied and pasted the irclogs from the tux3 university sessions. As I haven't seen them on the mailing list as of yet, should I post them? 2008-09-12 13:33 sure 2008-09-12 13:34 complete with all the swearing ;-) 2008-09-12 13:34 this is real life university 2008-09-12 13:34 i'll make sure of it :) 2008-09-12 13:34 might replace some of the bad words with @%$@# 2008-09-12 13:34 or not 2008-09-12 13:34 guess not :) 2008-09-12 13:34 whatever you think is right ;) 2008-09-12 13:34 nice nick 2008-09-12 13:35 well, it was datapunk when I was like 15 or so 2008-09-12 13:35 also good 2008-09-12 13:35 and as they tend to get shorte it now resembles a trekkie name 2008-09-12 13:35 but thanks 2008-09-12 13:35 you have piercings? or just virtual piercings? 2008-09-12 13:35 just virtual 2008-09-12 13:36 :) 2008-09-12 13:36 some of my best friends in berlin had some interesting piercings 2008-09-12 13:36 but for example, harald avoids it 2008-09-12 13:36 works better in the boardroom 2008-09-12 13:36 I was going to look at the reasons for it, but is the problem with deleting files known? 2008-09-12 13:37 frist I heard of it 2008-09-12 13:37 go ahead on it 2008-09-12 13:37 well, i don't particularly like them 2008-09-12 13:37 I wasn't very careful when I put that in 2008-09-12 13:37 ok, will do, after a little algebra session 2008-09-12 13:37 see you later 2008-09-12 13:37 wo wohnst du? 2008-09-12 13:37 karlsruhe 2008-09-12 13:37 if you know it 2008-09-12 13:37 ah cool 2008-09-12 13:37 near SAS 2008-09-12 13:37 sure, been there a few times 2008-09-12 13:38 quite 2008-09-12 13:38 quiet 2008-09-12 13:38 just like the name 2008-09-12 13:38 lots of geeks in the area 2008-09-12 13:38 yep, they are 2008-09-12 13:38 CS is pretty strong 2008-09-12 13:38 suse not far away 2008-09-12 13:38 ibm 2008-09-12 13:38 not sas 2008-09-12 13:38 um 2008-09-12 13:38 um 2008-09-12 13:38 sap? 2008-09-12 13:38 right 2008-09-12 13:38 where I've been too 2008-09-12 13:39 there's a great guy there 2008-09-12 13:39 gotten around a lot? 2008-09-12 13:39 drei jahre in Deutscheland 2008-09-12 13:39 um, 6 jahre 2008-09-12 13:40 that would be 6 Jahre 2008-09-12 13:40 getting rusty 2008-09-12 13:40 for work i guess? 2008-09-12 13:40 and fun 2008-09-12 13:40 well, i've only been to the usa for 11 months 2008-09-12 13:40 and that was for school... and fun 2008-09-12 13:41 that's about enough to be honest 2008-09-12 13:41 berlin is a lot more fun 2008-09-12 13:41 and less tense 2008-09-12 13:41 contrary to popular belief 2008-09-12 13:41 only been there a few times, mostly during the ccc congresses 2008-09-12 13:41 but it certainly is fun 2008-09-12 13:41 geek hotbed 2008-09-12 13:42 ok gotta go, will be back in an hour or so 2008-09-12 13:42 hottest hotbed in europe imho 2008-09-12 13:42 bis spater dann 2008-09-12 13:42 und zu weit weg :) 2008-09-12 13:42 bis dann 2008-09-12 13:42 ACTION is getting really rusty ;) 2008-09-12 13:44 I just had a thought 2008-09-12 13:44 we should schedule an official tux3 cabal meeting for Oct 31 2008-09-12 13:44 on irc, plus a real location in LA 2008-09-12 13:47 might be a good time 2008-09-12 14:12 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-12 14:12 hi tim_dimm 2008-09-12 14:12 coming up for air? 2008-09-12 14:12 hi flips 2008-09-12 14:12 trying to 2008-09-12 14:12 manage a quick skate today? 2008-09-12 14:13 still in sacramento 2008-09-12 14:13 oh right 2008-09-12 14:13 Pi and Persey are still in the ICU 2008-09-12 14:13 and it would be inadvisable anyway 2008-09-12 14:13 how's that? 2008-09-12 14:13 you got a week with no weeks under you to look forward to 2008-09-12 14:13 heh 2008-09-12 14:13 can't justify skating, unless it is to the nursery 2008-09-12 14:14 with no wheels under you I meant 2008-09-12 14:14 I can justify it if the heart rate is up enough 2008-09-12 14:14 getting bad typoitis here 2008-09-12 14:14 full word typos now 2008-09-12 14:14 happens when the volume of code goes stratospheric 2008-09-12 14:14 just read through the first of two tux3 U sessions 2008-09-12 14:14 it hung together pretty well 2008-09-12 14:14 going to now 2008-09-12 14:15 not too much pure bs 2008-09-12 14:15 some content 2008-09-12 14:15 what time of day did you start? 2008-09-12 14:15 today? 2008-09-12 14:15 or last night? 2008-09-12 14:15 8 pm tue and thur 2008-09-12 14:15 no, for the U 2008-09-12 14:15 k 2008-09-12 14:15 will be regular 2008-09-12 14:15 as far as I can manage 2008-09-12 14:15 sounds like a great tool for building community 2008-09-12 14:15 we had one inner linux guru here thursday 2008-09-12 14:16 eric biederman 2008-09-12 14:16 linux cluster guy 2008-09-12 14:16 the linux cluster guy 2008-09-12 14:16 and of course natalie is an inner linux gal 2008-09-12 14:16 googling... 2008-09-12 14:16 you'll get a few hits ;-) 2008-09-12 14:17 you get that email this am about SE Linux? 2008-09-12 14:17 223K to be exact 2008-09-12 14:18 no 2008-09-12 14:18 let me check 2008-09-12 14:18 oh 2008-09-12 14:18 yes 2008-09-12 14:19 right direction? 2008-09-12 14:19 knew about apparmor, suse's better answer to selinux 2008-09-12 14:19 uses the same kernel hooks 2008-09-12 14:19 yes 2008-09-12 14:19 I'm not sure how much apparmor is being worked on right now 2008-09-12 14:19 it's another one of those good projects that gets beaten up by something sloppier but more devs 2008-09-12 14:20 apparently, Novell canned all the engineers working on it in '07 2008-09-12 14:20 bh...Got any intel on apparmor? 2008-09-12 14:20 ? 2008-09-12 14:20 ah 2008-09-12 14:20 ACTION pokes bh 2008-09-12 14:21 heh 2008-09-12 14:21 be nice to know what happened there 2008-09-12 14:21 see who's maintaining it even 2008-09-12 14:21 somebody always maintains os projects 2008-09-12 14:21 they never die... except for evms 2008-09-12 14:21 RIP 2008-09-12 14:21 well 2008-09-12 14:22 lvm3 will rise ;-) 2008-09-12 14:22 we're about a month away from serious lvm3 development 2008-09-12 14:22 http://www.novell.com/linux/security/apparmor/selinux_comparison.html 2008-09-12 14:22 kickoff 2008-09-12 14:22 tim_dimm, there's a proposal to have a public tux3 cabal meeting on Oct 31 2008-09-12 14:22 physically located at a certain garage I'm thinking of 2008-09-12 14:23 and on the web/net 2008-09-12 14:23 what think you? 2008-09-12 14:23 I'm there barring spit-up, diaper changes and burping sessions 2008-09-12 14:23 barring? 2008-09-12 14:23 should be irrespective of 2008-09-12 14:23 uh, poor choice of words 2008-09-12 14:23 anything except burping 2008-09-12 14:23 farting? 2008-09-12 14:23 not good enough 2008-09-12 14:23 too early to teeth 2008-09-12 14:24 how do you spell that- teeeethhh 2008-09-12 14:24 you know 2008-09-12 14:24 dana had one the first week 2008-09-12 14:24 was hell for anna 2008-09-12 14:24 i bet 2008-09-12 14:24 she quickly learned how to punish mommy with it 2008-09-12 14:25 so began a somewaht tense relationship ;) 2008-09-12 14:25 still? 2008-09-12 14:25 ;-) 2008-09-12 14:25 of course 2008-09-12 14:25 but detent has set in, mutual respect, mommy love, all that 2008-09-12 14:25 this apparently lasts till about 9 YO 2008-09-12 14:26 with luck 2008-09-12 14:26 guess they grow out of it 2008-09-12 14:26 tween is the new teen 2008-09-12 14:26 anyway 2008-09-12 14:26 we better stop talking like that 2008-09-12 14:26 or all the devs willrun away screaming 2008-09-12 14:27 tux3 and child-rearing 2008-09-12 14:27 and skating 2008-09-12 14:27 you will have lots of time to learn C while you're burping 2008-09-12 14:27 and rocking 2008-09-12 14:27 right, on that note I think I'll go skate now 2008-09-12 14:27 oh, big news 2008-09-12 14:27 k 2008-09-12 14:28 all ears 2008-09-12 14:28 (eyes) 2008-09-12 14:28 skateboarders clapped for my move yesterday 2008-09-12 14:28 nice- what was it? 2008-09-12 14:28 pronounced: "ok rollerbladers are allowed now" 2008-09-12 14:28 nothing much 2008-09-12 14:28 grind? 2008-09-12 14:28 skated up on the little vert wall on one skate, tapped the top with the other, skated down on one skate 2008-09-12 14:29 nice 2008-09-12 14:29 been grinding and getting nodes 2008-09-12 14:29 also skating down the top of the grinding wall 2008-09-12 14:29 very skinny 2008-09-12 14:29 tough to stay on 2008-09-12 14:29 it has an S curve at the end 2008-09-12 14:29 careful, grinds lead to crashes which leads to wrist injuries 2008-09-12 14:29 not much, but enough to drop you off 2008-09-12 14:29 my grinds aren't really grinds 2008-09-12 14:30 just slding down the rail onthe side of my skate 2008-09-12 14:30 one foot 2008-09-12 14:30 no danger 2008-09-12 14:30 I need to get protection before doing anything more 2008-09-12 14:30 makes lots of noise 2008-09-12 14:30 attracts attention ;) 2008-09-12 14:31 found the head of the U logs 2008-09-12 14:31 reading now 2008-09-12 14:31 have fun 2008-09-12 14:31 loads 2008-09-12 15:08 i just noticed: someone (?) said that dentries were 132 bytes. on my system it says 200. Normal deviations? 2008-09-12 15:09 or just a different kernel version? 2008-09-12 15:10 I'm reading the logs right now, see 8 references to dentries. none mention how many bytes 2008-09-12 15:11 second.05:29 < RazvanM> dentry 253015 253576 132 29 1 : tunables 120 60 8 : slabdata 8744 8744 0 2008-09-12 15:12 my bad- I searched for dentries 2008-09-12 15:12 not dentry 2008-09-12 15:13 well, not really important. just something i was wondering about 2008-09-12 15:14 data, 64 bit kernel? 2008-09-12 15:15 grossly big aren't they 2008-09-12 15:15 filename "foo" turns into a 200 byte dentry, and that's far from all the cache gobbling for that little guy 2008-09-12 15:15 yes, it is. right on both accounts. 2008-09-12 15:16 that's what makes sysfs such an idiotic idea 2008-09-12 15:16 take tiny little ascii strings which are already bloated way beyond the binary rep, and blow them up into gigantic, slow, awkward things 2008-09-12 15:17 then implement it badly on top of that 2008-09-12 15:17 and have a crappy internal and external interface 2008-09-12 15:17 bugs 2008-09-12 15:17 unstable api 2008-09-12 15:17 and you have the piece of shit we see today 2008-09-12 15:17 just thought I'd share that ;-) 2008-09-12 15:50 sk8 oclock 2008-09-12 15:54 ACTION is back 2008-09-12 15:54 I know nothing about apparmor 2008-09-12 15:57 kernel klink I presume ;-) 2008-09-12 15:57 (hogan's hero's) 2008-09-12 15:58 diaper-30 2008-09-12 15:58 l8tr 2008-09-12 18:17 nuther cuppa 2008-09-12 18:17 should be good enough to get refcounting implemented 2008-09-12 18:32 ok I see october 31st is a friday 2008-09-12 18:32 that means that the tux3 cabal meeting has to be a party 2008-09-12 18:32 might have to scale this up 2008-09-12 19:04 #define REFCOUNT_TABLE_BLOCK (1ULL << 28) 2008-09-12 19:04 #define REFCOUNT_HIGH_BLOCK (REFCOUNT_TABLE_BLOCK + (1ULL << 21)) 2008-09-12 19:04 #define UNATOM_TABLE_BLOCK (REFCOUNT_TABLE_BLOCK + (1ULL << 23)) 2008-09-12 19:10 -!- Aks(~ankitsriv@123.237.71.198) has joined #tux3 2008-09-12 19:16 Hi all, I am new to this project and want to know abt versioned pointers. 2008-09-12 19:16 welcome 2008-09-12 19:16 read the post yet? 2008-09-12 19:17 http://lwn.net/Articles/288896/ 2008-09-12 19:17 thanks for the link 2008-09-12 19:18 enjoy 2008-09-12 19:22 atom = dir->sb->atomgen++; /* use refcount for allocation */ 2008-09-12 19:22 if (!ext2_create_entry(dir, name, len, atom, 0)) { 2008-09-12 19:22 unsigned block = ATOM_REFCOUNT_BLOCK + (atom >> (dir->sb->blockbits - 1)); 2008-09-12 19:22 struct buffer *buffer = bread(dir->map, block); 2008-09-12 19:22 *(u16 *)buffer->data += 1; 2008-09-12 19:22 brelse(buffer); 2008-09-12 19:22 return atom; 2008-09-12 19:22 } 2008-09-12 19:23 got to put in the carry bit handling 2008-09-12 19:29 mistake in that code 2008-09-12 19:30 ACTION challenges tux3 readers to find it 2008-09-12 19:30 it's in the block calc 2008-09-12 19:32 ACTION steps out for a bit 2008-09-12 19:32 oh, and I have to do endian conversion 2008-09-12 19:32 almost forgot 2008-09-12 20:20 atom >> (dir->sb->blockbits - 1) - that looks weird, although I'm not actually sure what exactly blockbits is, either way the -1 smells wrong 2008-09-12 20:21 ah, never mind 2008-09-12 20:21 *(u16 *)buffer->data += 1 <- this is wrong since this is first u16 in block 2008-09-12 20:23 lacks [atom & ((1 << (dir->sb->blockbits)) - 1] instead of the prefix '*' 2008-09-12 20:23 ie. it should be ((u16 *)buffer->data)[atom & ((1 << (dir->sb->blockbits)) - 1]++; 2008-09-12 20:24 ((u16 *)buffer->data)[atom & ((1 << dir->sb->blockbits) - 1)]++; 2008-09-12 20:24 parentheses got mixed up 2008-09-12 20:25 from which I guess blockbits on a 4K filesystem is lg2(4Ki) = 12 2008-09-12 20:26 at which point the first comment about it smelling funny is irrelevant (I thought blockbits was the number of bits in a block, ie. 4Ki * 8 for a 4KiB block) 2008-09-12 20:26 so you're using bread/brelse, not bios? 2008-09-12 20:26 does brelse bwrite? 2008-09-12 20:27 I guess it must work like some sort of in kernel mmap 2008-09-12 20:27 hence the no nead for explicit write back 2008-09-12 20:27 at which point I guess we rely on cpu page dirty bits to actually know whether we need to write back 2008-09-12 20:31 unless bread/brelse, are actually operations on blocks within a file, which is suggested by the dir->map first parameter 2008-09-12 20:31 hmm, isn't it clear I have no bloody idea what I'm talking about yet? 2008-09-12 20:31 and I'm talking into a blackhole... 2008-09-12 20:31 start talking to yourself... then you know you're going crazy. 2008-09-12 20:41 -!- Kirantpatil(~kiran@122.167.202.116) has joined #tux3 2008-09-12 21:08 maze, it's all messed up, actually 2008-09-12 21:08 rev on the way 2008-09-12 21:08 you are right about the lacks 2008-09-12 21:08 maze, I should just have asked you to write it ;-) 2008-09-12 21:08 ACTION will be back 2008-09-12 21:09 which lacks 2008-09-12 21:09 don't leave ;-) I'm here 2008-09-12 21:09 what am I right about? I was guessing above? 2008-09-12 21:10 ofcourse with u16* in their it's all only host-endian compatible 2008-09-12 21:10 s/their/there 2008-09-12 21:11 I think stick to 1 byte counters - deal away with endianness in all but 0.5% of the cases 2008-09-12 21:12 eh, I'm not sure this entire effort is worth it 2008-09-12 21:12 I think you just need to support a half-dozen hard coded atoms, a small list of atoms for selinux, and store the rest as strings 2008-09-12 21:12 ultimately just gzipping the xattr block may be the easiest 2008-09-12 21:13 for everything besides selinux/acl 2008-09-12 21:13 [still parsing through the binary acl encoding, to see if it can be faked] 2008-09-12 21:20 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-12 21:33 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-12 21:35 although linux currently seems to use something more like 32 [4 byte header] + (16 [tag/type] + 16[...rwx] + 32[default=-1]) * 4 + (16 [tag/type] + 16[...rwx] + 32[uid/gid]) * [# of exceptions] bits 2008-09-12 21:35 either way, while these are normally small - or not even present - they can grow arbitrally large 2008-09-12 21:37 hmm 2008-09-12 21:37 interesting questions 2008-09-12 21:37 do filesystems in linux implement selinux and acls, or do they just implement xattr - would think just xattr, but... there are ext2/3/4... etc acl.h 2008-09-12 21:38 oh, directories get doubled entries, one being default 2008-09-12 21:38 the other being actual 2008-09-12 21:39 the 4 byte header is the version (lendian 2) 2008-09-12 22:00 back 2008-09-12 22:00 maze, it's not much effort 2008-09-12 22:00 . 2008-09-12 22:00 and it exactly emulates the ascii xattr behaviour, with superior compression 2008-09-12 22:01 now I just need to write it right ;) 2008-09-12 22:01 yes, but there are some issues there 2008-09-12 22:01 there are? 2008-09-12 22:01 for example: selinux is a quad of four values 2008-09-12 22:01 I thought I was just about done 2008-09-12 22:01 if you have a lot of valid states for each of them 2008-09-12 22:01 then the total number of states for the entire quad can blossom 2008-09-12 22:01 those are attr bodies 2008-09-12 22:02 we're working on attr names at the moment 2008-09-12 22:02 ah, and see... this is the tricky part 2008-09-12 22:02 you kind of have to look at some of them at the same time 2008-09-12 22:02 some of... selinux acls? 2008-09-12 22:03 so basically, once you strip out the selinux and extended acl xattrs (that's 3 different xattr strings), all that's left is barely used by anyone 2008-09-12 22:03 ok I see what you're saying 2008-09-12 22:03 sorry, didn't read carefully 2008-09-12 22:03 you're running out way ahead of me 2008-09-12 22:03 as usual 2008-09-12 22:03 it may not be worth optimizing that... 2008-09-12 22:03 well it can be optimized at the selinux level 2008-09-12 22:04 exactly. 2008-09-12 22:04 maybe not quite as efficiently 2008-09-12 22:04 maybe more 2008-09-12 22:04 so selinux basically needs to be optimized 2008-09-12 22:04 let's let them tell us 2008-09-12 22:04 acls need to be optimized 2008-09-12 22:04 probably 2008-09-12 22:04 and then all the rest needs to be (maybe?) optimized if we feel like it 2008-09-12 22:04 so I suggest that once xattrs are working properly, we invite the selinux folks to come over and do an audit 2008-09-12 22:04 and what needs to be optimized is not the headers (ie. the security.something= part) 2008-09-12 22:04 but the bodies 2008-09-12 22:05 ie. the part after the = 2008-09-12 22:05 both 2008-09-12 22:05 imho 2008-09-12 22:05 the part before the = sign is trivial, there are about 9 values to compress as atoms, leave the rest as strings 2008-09-12 22:05 the heads are much less variable, therefore so much easier to optimize 2008-09-12 22:05 agreed. 2008-09-12 22:05 the problem, though, isn't so much what to optimize 2008-09-12 22:06 so... bodies 2008-09-12 22:06 not that hard 2008-09-12 22:06 but how to store this, and where... 2008-09-12 22:06 but maybe not appropriate at this level 2008-09-12 22:06 we'll see 2008-09-12 22:06 and which parts to store on disk where 2008-09-12 22:06 it's kind of important to plan this out correctly to begin with, because this is ondisk format, not in memory 2008-09-12 22:06 lyou know, we'd probably get more mileage out of giving the selinux guys a way to run their own dictionary 2008-09-12 22:06 just like our atom dictionary 2008-09-12 22:06 exactly... 2008-09-12 22:06 and have an api for it 2008-09-12 22:06 kay 2008-09-12 22:06 we'll propose it 2008-09-12 22:06 I'm convinced now we need 4 dicts for selinux (one per quad) 2008-09-12 22:07 but first let them have a look at the basics 2008-09-12 22:07 and a dictionary for acls 2008-09-12 22:07 and a dictionary for 'other' xattrs headers 2008-09-12 22:07 it's my understanding they're usually disappointed with performance etc of just basic xattrs 2008-09-12 22:07 so basically what we need is some extensible dict interface 2008-09-12 22:07 trying not to let tux3 fall into that 2008-09-12 22:07 which falls into the log nicely 2008-09-12 22:07 right 2008-09-12 22:07 and it's app specific 2008-09-12 22:07 we need to export an api, not for dicts 2008-09-12 22:07 not four dicts 2008-09-12 22:08 the api would be the normal add/remove/list xattr api that everybody uses 2008-09-12 22:08 what's important is how we store it internally in the fs 2008-09-12 22:08 plus a wahy of dividing it into four 2008-09-12 22:08 that's easy - split on : 2008-09-12 22:08 bleah 2008-09-12 22:08 no parsing 2008-09-12 22:08 in the fs 2008-09-12 22:09 mechanism, not policy 2008-09-12 22:09 yeah, well, that's gonna have to happen, unless you don't want to split the quads 2008-09-12 22:09 which could potentiall explode the dics 2008-09-12 22:09 no parsing ;-) 2008-09-12 22:09 notice, tux3 has no parsing 2008-09-12 22:09 if we want we can use zlib 2008-09-12 22:09 or something more global 2008-09-12 22:09 you're missing the point here ;-) 2008-09-12 22:09 zlib is nice and all that 2008-09-12 22:10 but selinux xattrs are used on every frickin file access 2008-09-12 22:10 still 2008-09-12 22:10 they have to be blazing fast 2008-09-12 22:10 no parsing ;-) 2008-09-12 22:10 we need a better solution 2008-09-12 22:10 you can make the parsing in such a way that it'll still work even if it doesn't parse 2008-09-12 22:10 preferably one that performs even better than stupid ascii colon separated strings 2008-09-12 22:10 ah, but the ascii colon seperated strings are the api 2008-09-12 22:11 you have to do it that way 2008-09-12 22:11 sucky api 2008-09-12 22:11 unless we rip through all of the selinux code in the kernel 2008-09-12 22:11 anyway, linux does not have an acl api 2008-09-12 22:11 selinux does 2008-09-12 22:11 different 2008-09-12 22:11 the vfs layer provides an xattr api 2008-09-12 22:11 that's what we have to implement 2008-09-12 22:11 right, only that 2008-09-12 22:11 well 2008-09-12 22:11 we don't have to parse 2008-09-12 22:11 however internally we have to make it deal with the common cases quickly 2008-09-12 22:12 we can compress on byte pair if we like 2008-09-12 22:12 byte pair of what? you're getting strings in the api? 2008-09-12 22:12 byte pairs is a typical compression method 2008-09-12 22:12 16 bit values work better with a dict than 8 or 48 2008-09-12 22:12 for example 2008-09-12 22:12 and the common cases are going to be reading (and to a lesser extent writing) selinux xattrs and less often (but still very often) extended acls 2008-09-12 22:13 verging on premature optimization here 2008-09-12 22:13 no no, don't think lzw compression - that doesn't buy us anything here 2008-09-12 22:13 the selinux guys will cream all over if xattrs just work fine 2008-09-12 22:13 what needs to be done is we need to explicitly remove the selinux/acl from the xattr code and not treat them in the fs as xattrs at all 2008-09-12 22:13 treat them like you treat inode permissions 2008-09-12 22:14 put them directly in the inode 2008-09-12 22:14 have you measured the actual disbribution of unique quads? 2008-09-12 22:14 I thought you did that 2008-09-12 22:14 yes 2008-09-12 22:14 and it came out very tight 2008-09-12 22:14 exactly 2008-09-12 22:14 tightly clustered too 2008-09-12 22:14 so what's the problem 2008-09-12 22:14 but that's not something we can guarantee on a prod system 2008-09-12 22:14 just atomize the common ones 2008-09-12 22:14 store the weirdos literally 2008-09-12 22:14 agreed. 2008-09-12 22:14 ok 2008-09-12 22:14 so let's do it 2008-09-12 22:15 hmm, how to put this 2008-09-12 22:15 you don't want the xattr_get(selinux_xattr) 2008-09-12 22:15 to have to parse the entire xattr block for the inode 2008-09-12 22:16 it doesn't 2008-09-12 22:16 it only looks in the xcache 2008-09-12 22:16 but xattrs of other types, can be pretty much unique per file... 2008-09-12 22:16 (above md5/sha1 hash case) 2008-09-12 22:17 right 2008-09-12 22:17 so I guess we're going to check a hash of the xattr 2008-09-12 22:17 htree style 2008-09-12 22:17 so having the two very differently performing/characteristic concepts in one place will most likely break performance 2008-09-12 22:18 it's not very hard to look for likely atomize candidates I think 2008-09-12 22:18 depends on what the symbols of the alphabet 2008-09-12 22:18 and how deeply you're parsing 2008-09-12 22:18 I say, just put everything in the dict 2008-09-12 22:18 why not? 2008-09-12 22:18 has to be stored somewhere 2008-09-12 22:19 security.selinux="unconfined_u:object_r:default_t:s0\000" 2008-09-12 22:19 system.posix_acl_access="0sAgAAAAEABwD/////AgAEAGQAAAAEAAUA/////xAABQD/////IAAFAP////8=" 2008-09-12 22:19 user.hash="sdfsdfjhsdjfhsdjkfhdjskahfjkdsahkj" 2008-09-12 22:19 the dict is as good a place as any 2008-09-12 22:19 how would you atomize the above? 2008-09-12 22:19 ext2_find_entry 2008-09-12 22:19 later, htree_find_entry 2008-09-12 22:20 sucky compression in the example 2008-09-12 22:21 so how will the dict deal, with a few dozen entries with milions of occurences, a few hundred with tens of thousands, and a few million entries with one (to a couple) occurence(s) each 2008-09-12 22:21 that's a real world scenario straight of my laptop 20G drive 2008-09-12 22:21 millions of occurences, what's the problem? 2008-09-12 22:21 few million, easy 2008-09-12 22:21 that's what htree does 2008-09-12 22:21 handles millions of entries 2008-09-12 22:21 really fast 2008-09-12 22:22 uhm, I think my problem is I'm not convinced it's fast enough, when it could be O(1) 2008-09-12 22:22 o(1) is always good 2008-09-12 22:22 the millions of unique entries are blossoming the tree 2008-09-12 22:22 but damm fast is damm fast 2008-09-12 22:23 it's a btree 2008-09-12 22:23 slowing down accesses for the millions of entries 2008-09-12 22:23 it likes to blossom 2008-09-12 22:23 it says "go ahead, make my day" 2008-09-12 22:23 right, but are common entries stored nearer the root? 2008-09-12 22:23 never 2008-09-12 22:23 no - because it's a btree 2008-09-12 22:23 right 2008-09-12 22:23 so you've got o(depth) lookups 2008-09-12 22:23 very flat 2008-09-12 22:23 usually only two levels 2008-09-12 22:23 precisely what you want to avoid 2008-09-12 22:23 for a few million entires 2008-09-12 22:23 depth is smaller than you think 2008-09-12 22:24 much smaller 2008-09-12 22:24 but you're thinking of access speed from a disk io performance 2008-09-12 22:24 outlook 2008-09-12 22:24 nope 2008-09-12 22:24 we need to be fast in ram 2008-09-12 22:24 cpu speed 2008-09-12 22:24 that's what it works at 2008-09-12 22:24 dirops are cpu bound 2008-09-12 22:24 not disk bound 2008-09-12 22:25 ok, I'm firmly of the opinion we need 2 different dicts/htrees at the minimum 2008-09-12 22:25 I agree: 1) heads 2) bodies 2008-09-12 22:25 but I know you want to parse and segment 2008-09-12 22:25 one small one for the stuff which is known to exist all over the place (selinux/acl bodies) 2008-09-12 22:25 I don't think we should, the selinux guys should 2008-09-12 22:25 but 2008-09-12 22:25 the other for non-standard bodies 2008-09-12 22:25 I;'ll keep an open mind 2008-09-12 22:25 we would need to change the kernel vfs interface of selinux - I don't see that happening 2008-09-12 22:26 and then we'd need to keep around the old one for other fs'es anyway 2008-09-12 22:26 well xattrs already have namespaces 2008-09-12 22:26 part of the abpi 2008-09-12 22:26 api 2008-09-12 22:26 braindamaged part 2008-09-12 22:26 and even if we don't split, we'll still get perf boosts 2008-09-12 22:26 (don't split on :) 2008-09-12 22:26 that's a colon ) 2008-09-12 22:26 I thought it was a smile :) 2008-09-12 22:26 well it was a ':' than a ')' 2008-09-12 22:26 ;-) 2008-09-12 22:27 ok, immediate goal is to fix the brain damage in my refcounting 2008-09-12 22:27 if you have a dentry and inode already in memory 2008-09-12 22:27 sorry about the pile of poo I posed ;) 2008-09-12 22:27 how long does it take to fetch the xattrs for that inode? 2008-09-12 22:27 order of magnitude 2008-09-12 22:27 oh that's another thing... we can easily put a small hash in front of the atom dict 2008-09-12 22:28 very easily 2008-09-12 22:28 sub microsecond 2008-09-12 22:28 I guess 2008-09-12 22:28 [because with the correct implementation the above is less than 50 cycles] 2008-09-12 22:28 sure, and a microsecond is about 3,000 2008-09-12 22:28 60 times slower 2008-09-12 22:29 so there's something to be gained 2008-09-12 22:29 I thinjk we gain most of it from putting a hash in front of the dirops 2008-09-12 22:29 hash of what? 2008-09-12 22:29 so we end up with level 1, level 2 2008-09-12 22:29 hash of the thing we're atomizing 2008-09-12 22:29 just keep the common ones there 2008-09-12 22:29 let the cold ones drop off 2008-09-12 22:30 ok, now you've gotten ahead of me... 2008-09-12 22:30 it's a linux meme 2008-09-12 22:30 dentry hash as an example of that 2008-09-12 22:30 ok, how about, first a question: what exactly is an atom (example?) 2008-09-12 22:30 sucky example 2008-09-12 22:30 [how big is an atom] 2008-09-12 22:30 it's just a small integer with a name 2008-09-12 22:30 and a refcount 2008-09-12 22:30 ok, the names, can we have an example? 2008-09-12 22:30 the name of an atom is up to 255 chars (tradition) 2008-09-12 22:31 names are unrestricted 2008-09-12 22:31 pascal strings ;-) 2008-09-12 22:31 right 2008-09-12 22:31 in fact they are pascal strings 2008-09-12 22:31 that's what ext2 uses 2008-09-12 22:31 and what I always use 2008-09-12 22:31 ok, so how would you atomize (where would the atom boundaries be) in the above 3 line xattr example I posted? 2008-09-12 22:32 beats the crap out of shitty C strings 2008-09-12 22:32 strlen is fast ;-) 2008-09-12 22:32 sucks compared to looking up a byte 2008-09-12 22:32 strlen does cacheline damage 2008-09-12 22:32 "considered harmful to cache lines" 2008-09-12 22:32 I meant strlen is fast on pascal strings 2008-09-12 22:32 right 2008-09-12 22:33 I'd atomize the whole 3 line xattr 2008-09-12 22:33 and store the atom 2008-09-12 22:33 the whole thing? 2008-09-12 22:33 and have a limit of 2^48 atoms 2008-09-12 22:33 ok, on my drive, you'd have all refcounts = 1 2008-09-12 22:33 right now, it's 2^32 atoms 2008-09-12 22:33 if we do bodies, might want to widen that 2008-09-12 22:33 sure 2008-09-12 22:33 who cares 2008-09-12 22:34 selinux does 2008-09-12 22:34 refcounts take hardly any space 2008-09-12 22:34 2 bytes each 2008-09-12 22:34 now, as soon as something _does_ collide, you know right away 2008-09-12 22:34 that is 2008-09-12 22:34 match another body 2008-09-12 22:34 well 2008-09-12 22:34 anyway 2008-09-12 22:34 it's premature 2008-09-12 22:35 xattrs have to work 2008-09-12 22:35 or nobody cares how well we compress acls 2008-09-12 22:35 yes, but they have to be treated seperately 2008-09-12 22:35 here - I'll write up how it should be done IMHO 2008-09-12 22:35 I know, you want to tokenize 2008-09-12 22:35 the xattr string 2008-09-12 22:35 and compress that way 2008-09-12 22:36 so why don't they tokenize? 2008-09-12 22:36 yes please 2008-09-12 22:36 and let's invite the selinuxen to read that post 2008-09-12 22:37 builtin:[security.selinux] sedict1:[unconfined_u] sedict2:[object_r] sedict3:[default_t] sedict4:[s0] 2008-09-12 22:37 builtin:[system.posix_acl_access] acl_dict:[0sAgAAAAEABwD/////AgAEAGQAAAAEAAUA/////xAABQD/////IAAFAP////8] 2008-09-12 22:37 builtin:[user.] user_dict:[hash] user_dict:[sdfsdfjhsdjfhsdjkfhdjskahfjkdsahkj] 2008-09-12 22:37 I'm still researching the subject 2008-09-12 22:37 and then you want to store the first two directly within the inode 2008-09-12 22:37 got to fix my mess here now 2008-09-12 22:37 that's for one file? 2008-09-12 22:38 sedict1=2=3=4 could be the same dict, potentially could be the same dict as the acl_dict, potentially the same as builtin 2008-09-12 22:38 yes 2008-09-12 22:38 we stoe all of the directly in the inode 2008-09-12 22:38 on disk 2008-09-12 22:38 and cache them in memory 2008-09-12 22:38 when the inode is loaded 2008-09-12 22:38 so what's the size of an inode on disk? 2008-09-12 22:38 variable 2008-09-12 22:38 maximum? 2008-09-12 22:39 from about 40 bytes to unlimited 2008-09-12 22:39 current limitation is an inode table block 2008-09-12 22:39 but that will go away 2008-09-12 22:40 hmm, I need to start writing a junk fs 2008-09-12 22:40 to get a better feeling for the kernel interfaces 2008-09-12 22:40 you'll get an excellent chance very soon 2008-09-12 22:40 we're going to start by porting a junk fs to kernel 2008-09-12 22:40 next tuesday we'll do that 2008-09-12 22:41 really? 2008-09-12 22:41 hmm 2008-09-12 22:41 promise 2008-09-12 22:41 you said you wanted me to pick up the pace 2008-09-12 22:41 pick it up we shall 2008-09-12 22:41 hope we don't lose anybody ;) 2008-09-12 22:41 I'm beginning to think an fs should actually have (at least) two layers 2008-09-12 22:42 basically the frontend (UI / interface with vfs) and the backend (interface with block devices) 2008-09-12 22:42 with the ability for the middle to be network seperated 2008-09-12 22:42 what about the inodes in the middle? 2008-09-12 22:42 you need a clean api in the middle that deals correctly with coherency issues 2008-09-12 22:43 but I think this is the only way to get a well performing net fs 2008-09-12 22:43 you are entirely correct, and that is how tux3 is structured 2008-09-12 22:43 it has the cache level and the block level 2008-09-12 22:43 they are separately cleanly... better be 2008-09-12 22:43 or it simply won't work 2008-09-12 22:44 the backend is then a get/set/lock/unlock/notify system 2008-09-12 22:44 well 2008-09-12 22:44 kinda 2008-09-12 22:44 the back end is more like async messages 2008-09-12 22:44 loosely 2008-09-12 22:44 very loosely 2008-09-12 22:44 yeah, that kind of describes what I'm thinking 2008-09-12 22:45 hard to phrase really 2008-09-12 22:45 especially since it's still unclear to me ;-) 2008-09-12 22:45 the fact it has to be implemented as to separate pieces is now clear to me, with an interface layer that uses tcp-ip 2008-09-12 22:45 BUT 2008-09-12 22:45 can short-circuit the network stack on local host 2008-09-12 22:46 great, it was never clear to me ;) 2008-09-12 22:46 it just came out like that 2008-09-12 22:46 did it itself 2008-09-12 22:46 here's an example: 2008-09-12 22:47 application -> user space -> kernel space -> vfs layer -> client file system layer -> send rpc call -> network stack -> receive rpc call -> dispatch -> server file system layer -> block device layer 2008-09-12 22:47 and that's only half the loop 2008-09-12 22:47 for an nfs 2008-09-12 22:47 now if the tcp/ip network stack layer does cookies or UUID to identify that it's talking to itself, than it can zip it up to 2008-09-12 22:48 client fs layer -> direct dispatch -> server fs layer 2008-09-12 22:48 and of course there's the return path 2008-09-12 22:48 that's the kind of thinking that originally lead to nfs 2008-09-12 22:48 actually, the reverse of that 2008-09-12 22:48 and it has to be part sync, part async, part notify 2008-09-12 22:48 we had your second one 2008-09-12 22:48 and some genius decided it could easily be hacked to be the first one 2008-09-12 22:48 anyway 2008-09-12 22:49 it's not a NFS 2008-09-12 22:49 the real problem is now how to minimize the latency and data sent across the net in the middle 2008-09-12 22:49 it will become a cluster fs before it becomes an nfs 2008-09-12 22:49 and that relies on doing cache coherency and read/write (various types of) and notifications of changes/lock/lock-breaking correctly 2008-09-12 22:49 yes 2008-09-12 22:50 anyway... enough about my plans to conquer the world 2008-09-12 22:50 which _nobody_ in the oss world has succeed in doing well 2008-09-12 22:50 probably also not in the propietary world either 2008-09-12 22:50 anyway, you need to plan the entire fs from the ground up with the assumption all the clients (even the local host) are remote 2008-09-12 22:50 since we can't see the code or try it we don't know 2008-09-12 22:50 that way you don't need to deal with the local host specially 2008-09-12 22:50 well 2008-09-12 22:51 you don't have to put in remote hooks from the beginning 2008-09-12 22:51 [except for the dispatch optimization] 2008-09-12 22:51 you just have to be aware of where problems can be created 2008-09-12 22:51 you have to design it as if they were there 2008-09-12 22:51 you might not code it quite like that 2008-09-12 22:51 although I think you should 2008-09-12 22:51 whre it costs nothing, yes 2008-09-12 22:51 even if the net code is a shim .h file 2008-09-12 22:51 that's seldom the case 2008-09-12 22:52 but the real problem is, you're answering a demand that doesn't exist 2008-09-12 22:52 people have been optimizing for the wrong situation ;-) 2008-09-12 22:52 you're hoping your nfs will be so much more amazing, everybody will use it instead of sucky nfs 2008-09-12 22:52 what demand do you think that is? 2008-09-12 22:52 but you're likely to be amazed and disappointed 2008-09-12 22:52 truthfully? 2008-09-12 22:52 there's very little demand for a good nfs outside of hpc 2008-09-12 22:52 and they like lustre 2008-09-12 22:52 I don't care about who uses it or not ;-) 2008-09-12 22:53 prefectly happy 2008-09-12 22:53 they just want it to be more reliable and faster 2008-09-12 22:53 I just like good design 2008-09-12 22:53 well 2008-09-12 22:53 just writing a dlm to support it will keep you busy for months 2008-09-12 22:53 if you know _exactly_ what to do 2008-09-12 22:53 yeah, I know, not a good way to design it ;-) 2008-09-12 22:53 if you want to make money 2008-09-12 22:53 but oh well 2008-09-12 22:54 you can do it 2008-09-12 22:54 if you already have something working 2008-09-12 22:54 that people want 2008-09-12 22:54 and are willing to bribe you to make even more like what they want 2008-09-12 22:54 eh, dlm's aren't that hard if you have a clean api and don't have to deal with prior borkage 2008-09-12 22:54 well 2008-09-12 22:54 "not hard" translates into several months, trust me 2008-09-12 22:54 problem is the leakage of breakage from outside 2008-09-12 22:54 but prove me wrong 2008-09-12 22:55 I would like to have a good dlm 2008-09-12 22:55 in fact 2008-09-12 22:55 would you be kind enough to post a design note on dlm? 2008-09-12 22:55 because I'd like to cluster tux3 2008-09-12 22:55 nope, because I don't have the design yet ;-) 2008-09-12 22:55 by this time next year 2008-09-12 22:55 well 2008-09-12 22:55 when? 2008-09-12 22:55 I think I'm going to try writing a junk fs this weekend 2008-09-12 22:55 unless stuff burns (I'm oncall) 2008-09-12 22:55 great 2008-09-12 22:56 and we'll see how well I understand the kernel apis 2008-09-12 22:56 I'll check in with you on saturday when you're 50% done 2008-09-12 22:56 you'll figure them out fast 2008-09-12 22:56 and I'm going to start writing directly in kernel space, because that's the entire purpose of the exercise ;-) 2008-09-12 22:56 little painful to get some of the crap to behave 2008-09-12 22:56 I believe debugging is probably easiest in kvm? 2008-09-12 22:56 uml 2008-09-12 22:56 far and away 2008-09-12 22:56 why? 2008-09-12 22:57 can you strace ? 2008-09-12 22:57 just: make defconfig ARCH=um && make linux ARCH=um; ./linux ubd0=/my/rootfs 2008-09-12 22:57 that's it 2008-09-12 22:57 you can gdb it 2008-09-12 22:57 ah 2008-09-12 22:57 takes a little coaxing 2008-09-12 22:58 where are the logs stored for this channel btw? 2008-09-12 22:58 checking tux3.org 2008-09-12 22:58 linked from shapor's page I think, which is linked from tux3.org 2008-09-12 22:58 http://shapor.com/tux3/irclogs/current.txt 2008-09-12 22:58 hehe, uptodate to the second 2008-09-12 22:59 ok, you will be having fun writing lots of new fs code and I will be slaving away finishing xattrs 2008-09-12 22:59 you got the better deal 2008-09-12 23:00 $ wget -q -O - "http://shapor.com/tux3/irclogs/current.txt" | cut -b18- | sed -rn 's@^<([^>]*)>.*@\1@p' | sort | uniq -c | sort -nr | head -n 9 2008-09-12 23:00 6427 flips 2008-09-12 23:00 1199 shapor 2008-09-12 23:00 908 MaZe 2008-09-12 23:00 818 bh 2008-09-12 23:00 616 konrad 2008-09-12 23:00 369 tim_dimm 2008-09-12 23:00 113 vandenoever 2008-09-12 23:00 104 RazvanM 2008-09-12 23:00 96 flipz 2008-09-12 23:00 interesting stats there 2008-09-12 23:01 how the hell I'm number 3 on that list I'll never know... 2008-09-12 23:01 you're moving up fast 2008-09-12 23:01 fast typer 2008-09-12 23:01 just have to press the enter key enough :) 2008-09-12 23:01 well, yeah 2008-09-12 23:01 and wiggle those fingers 2008-09-12 23:01 now everybody is just typing stuff to boost their ranking ;-) 2008-09-12 23:02 nice example of sed chickentracks 2008-09-12 23:02 I am a sed-maniac 2008-09-12 23:02 you can cut and paste your code examples if you need a quick boost 2008-09-12 23:03 right, well, I also have a copy of spore waiting for me... 2008-09-12 23:03 wonder if it's any good 2008-09-12 23:04 let me know 2008-09-12 23:04 and I should reinstall my desktop with the next version of ubuntu 2008-09-12 23:04 my 4 year old can't wait to get her hands on pure 2008-09-12 23:04 the quad racing game 2008-09-12 23:04 from disney 2008-09-12 23:04 demon is much fun 2008-09-12 23:04 demo 2008-09-12 23:05 I'll probably take a few spins around the italian track after I do the next refcount iter 2008-09-12 23:05 folks 2008-09-12 23:05 unsigned attomoff = (atom << 1) & (-1 << blockbits); 2008-09-12 23:05 hey bh 2008-09-12 23:05 re above: new fs code includes xattrs eventually ;-) 2008-09-12 23:05 wow, when I need a reason not to code, one quickly arrives 2008-09-12 23:05 good luck with that ;) 2008-09-12 23:06 for me, a week on xattrs alone 2008-09-12 23:06 maybe you're faster 2008-09-12 23:06 uhm that code you posted looks wrong 2008-09-12 23:06 missing ~ 2008-09-12 23:06 yeah 2008-09-12 23:06 unsigned attomoff = (atom << 1) & ~(-1 << blockbits); 2008-09-12 23:06 that's why I pasted it ;) 2008-09-12 23:06 better than a compiler 2008-09-12 23:06 oh, sorry 2008-09-12 23:06 didn't realize it was a quiz 2008-09-12 23:06 heh 2008-09-12 23:07 no it was me actually fucking up 2008-09-12 23:07 in a way 2008-09-12 23:07 oh paste in wrong window? 2008-09-12 23:07 and in a way, not 2008-09-12 23:07 no 2008-09-12 23:07 I pasted it, you saw the bug 2008-09-12 23:07 nice 2008-09-12 23:07 heh 2008-09-12 23:08 unsigned block = ATOM_REFCOUNT_BLOCK + ((atom >> blockits) << 1)); 2008-09-12 23:09 unsigned block = ATOM_REFCOUNT_BLOCK + (atom >> (blockits - 1)); 2008-09-12 23:09 the above is wrong 2008-09-12 23:09 again ;-) 2008-09-12 23:09 since it's always even 2008-09-12 23:09 are you running an IQ test or something? 2008-09-12 23:10 a stupidity test on myself 2008-09-12 23:10 atom is not always even 2008-09-12 23:10 why would it be? 2008-09-12 23:10 block is 2008-09-12 23:11 ((atom >> blockits) << 1)) <- always even 2008-09-12 23:11 the code you pasted results in blocks even/oddness being constant 2008-09-12 23:11 yeah, gosh 2008-09-12 23:11 I'd assume you don't want that 2008-09-12 23:11 well actually I think I do 2008-09-12 23:11 the even block is for the low 16 bits 2008-09-12 23:11 the odd for the high 2008-09-12 23:12 then you're still 1 off 2008-09-12 23:12 probably 2008-09-12 23:12 since what you described is true with double 8 bits 2008-09-12 23:12 not with double 16 bits 2008-09-12 23:12 I was definitely conflating things and being fuzzy 2008-09-12 23:12 that's why you will get your xattrs written in _less_ than a week 2008-09-12 23:13 unsigned block = ATOM_REFCOUNT_BLOCK + (atom >> (blockits - 1)) << 1; 2008-09-12 23:13 is what you want if you want double-blocks of u16s with low and high blocks 2008-09-12 23:13 yes 2008-09-12 23:13 thanks 2008-09-12 23:13 although that deserves a comment ;-) 2008-09-12 23:13 see, now I'm getting semantically addressable code 2008-09-12 23:14 I loosely indicate the semantics, and the code comes back 2008-09-12 23:14 what do you mean? 2008-09-12 23:14 that's the one we want I think 2008-09-12 23:14 I think you'd actually want to spread it in a different way for performance reasons 2008-09-12 23:15 first all the low blocks, then all the high blocks 2008-09-12 23:15 since the high blocks will almost never be updated 2008-09-12 23:15 thus you want low blocks to be sequential on disk 2008-09-12 23:15 16 bit count with carry into the high block is a very nice balance between getting lots of atoms onto a block and not carrying too often 2008-09-12 23:15 there's no advantage to separating it that way 2008-09-12 23:15 that I know of 2008-09-12 23:15 a _very_ small advantage in the radix tree lookup 2008-09-12 23:16 but it was my first idea, what you said 2008-09-12 23:16 and I might stick with that indeed 2008-09-12 23:16 I think having two different ATOM_REFCOUNT_BLOCK would make for cleaner easier to understand code 2008-09-12 23:16 two different? 2008-09-12 23:16 ATOM_REFCOUNT_{LOW,HIGH}_BLOCK 2008-09-12 23:17 sure 2008-09-12 23:17 it's that way now 2008-09-12 23:17 before you posted that expression 2008-09-12 23:17 I haven't actually looked at the code ;-) 2008-09-12 23:17 just buggy 2008-09-12 23:17 ok 2008-09-12 23:17 give me an hour 2008-09-12 23:17 and you get to look at working, tested code 2008-09-12 23:17 I program much better when I'm not chatting 2008-09-12 23:18 ok, I think I'm gonna finally head home from work, and maybe reboot into mac os x and start up spore - and stop bothering you ;-) 2008-09-12 23:18 wow 2008-09-12 23:18 didn't realize you're camping out there 2008-09-12 23:18 but why not 2008-09-12 23:18 infinite junk food 2008-09-12 23:18 sweet sound of vacuum cleaners in the distance 2008-09-12 23:19 plus 30" screen 2008-09-12 23:19 not to mention my place is a studio 2008-09-12 23:19 that can be fixed 2008-09-12 23:19 just put in a ticket 2008-09-12 23:19 has an empty fridge (ok, I have spaghetti) 2008-09-12 23:19 you'll have one at home 2008-09-12 23:19 yah 2008-09-12 23:19 is a total mess 2008-09-12 23:19 you need to get that ramyun 2008-09-12 23:20 get it on the way home 2008-09-12 23:20 and is lacking (even after more than 2 years) basic amenities like a desk 2008-09-12 23:20 ouch 2008-09-12 23:20 just order one online 2008-09-12 23:20 ikea 2008-09-12 23:20 have it in 3 days 2008-09-12 23:20 because I've never found a good way to fit one in 2008-09-12 23:20 some assembly required 2008-09-12 23:20 that's the trick 2008-09-12 23:20 so to be fair, even back in Poland, when I had a desk, I spent most time with laptop on lap on bed 2008-09-12 23:20 and move to phoenix where you can have a real house ;) 2008-09-12 23:20 which is one of the reasons I've never bothered 2008-09-12 23:21 how do you say "hello" in polish? 2008-09-12 23:21 hallo? 2008-09-12 23:21 Dzień Dobry. 2008-09-12 23:21 is Good Day 2008-09-12 23:21 hello is 'halo' 2008-09-12 23:21 characters didn't work in xchat 2008-09-12 23:21 but that's kind of like pulled in from english 2008-09-12 23:22 that's the unaccented version of above? 2008-09-12 23:22 Dzien' Dobry 2008-09-12 23:22 n with accent like / 2008-09-12 23:22 got it 2008-09-12 23:22 like sheen dobry 2008-09-12 23:22 kinda 2008-09-12 23:22 with more bite 2008-09-12 23:22 and 'halo' is more like a phone greeting when you pick up then really hello 2008-09-12 23:23 ok, I'll try it on a pole tomorrow and see if it works 2008-09-12 23:23 you're more likely to use 'hej' (hey) between friends in person, and dzien dobry for more formal purposes and 'halo' when picking up the phone 2008-09-12 23:23 hmm 2008-09-12 23:23 let me write it down more phonetically 2008-09-12 23:23 ugh 2008-09-12 23:24 dobry = [hard d] [short o] [hard b] [hard trilled r] [short i/y] 2008-09-12 23:25 oh yeah I can say dobry 2008-09-12 23:25 even with the trill 2008-09-12 23:25 so dzien, is dzie - en - two syllables 2008-09-12 23:26 tshee - ehn 2008-09-12 23:26 ? 2008-09-12 23:26 were dzi is a consonant, e is a vowel and n' is a soft nasal n 2008-09-12 23:26 nice 2008-09-12 23:26 okay maybe more like one syllable, kind of hard to say because soft vowels (and there's two of them here) are kind of syllable like 2008-09-12 23:27 so the i in dzi takes the sound dz and makes it soft 2008-09-12 23:27 good enough to try 2008-09-12 23:27 I'll get corrected soon enough 2008-09-12 23:27 which is exactly the purpose of the accent on the n (accent on consonants is written as an i infront of a vowel - hence the dz "i" en' 2008-09-12 23:27 that's really dz'en' 2008-09-12 23:28 ah 2008-09-12 23:28 and dz is a single letter 2008-09-12 23:28 wow, more complex than say czech 2008-09-12 23:28 that just happens to be written with two 2008-09-12 23:28 since we ran out of latin characters 2008-09-12 23:29 hence dz rz sz cz ch should basically be treated as single letters 2008-09-12 23:29 and then there's sounds like drz brz and so on, which are almost like a single letter 2008-09-12 23:29 ugh 2008-09-12 23:29 with a trill? 2008-09-12 23:30 while you can theoretically read it as b-rz and d-rz, almost everybody pronounces is it quickly where it kind of melds into one 2008-09-12 23:30 and there's no 'r' in their ;-) 2008-09-12 23:30 since rz is actually 'z with a dot' ie. ż 2008-09-12 23:30 which is the first letter of my last name 2008-09-12 23:30 which is almost like the j in french 'je' 2008-09-12 23:30 so more like 'zh' 2008-09-12 23:31 IC 2008-09-12 23:31 it's actually very consistant 2008-09-12 23:31 very few words can't be read correctly if you know the relatively short set of rules 2008-09-12 23:31 there are very few exceptions 2008-09-12 23:31 I'll learn those rules 2008-09-12 23:31 over time 2008-09-12 23:32 slavic languages are fun 2008-09-12 23:32 great langues for being cynical in 2008-09-12 23:32 from what I have seen 2008-09-12 23:32 an example being frozen (zmarznięty), where the syllable split is zmar-znię-ty [the ę is nasal e, or e with a , ogonek accent - like french c with cedilla, but flipped other way] 2008-09-12 23:32 whoops 2008-09-12 23:32 none of those chars are working on braindead xchat 2008-09-12 23:32 so even though you see rz it ain't zh 2008-09-12 23:33 those were all the same character 2008-09-12 23:33 time for a less braindead chat client 2008-09-12 23:33 probably because I'm writing in utf8 2008-09-12 23:33 yes 2008-09-12 23:33 although they show up fine in pidgin 2008-09-12 23:33 but xchat should grok that 2008-09-12 23:33 xchat probably does 2008-09-12 23:33 xchat sucks at everything by default 2008-09-12 23:33 I'd guess your terminal is not utf8 or something 2008-09-12 23:33 I wouldn't assume taht 2008-09-12 23:33 it doesn't run in a terminal 2008-09-12 23:35 oh, it doesn't? 2008-09-12 23:35 oh right the x ;-) 2008-09-12 23:36 what are we on here? 2008-09-12 23:36 -!- maze_pallas(~elbereth@216-239-45-4.google.com) has joined #tux3 2008-09-12 23:36 oftc 2008-09-12 23:36 let's see if my xchat works 2008-09-12 23:36 ąćęłńóśźż 2008-09-12 23:36 works 4 me 2008-09-12 23:36 ąćęłńóśźż 2008-09-12 23:36 yup 2008-09-12 23:37 ĄĆĘŁŃÓŚŹŻ 2008-09-12 23:37 xchat does utf8 great here 2008-09-12 23:37 ĄĆĘŁŃÓŚŹŻ 2008-09-12 23:37 well 2008-09-12 23:37 right 2008-09-12 23:37 I have to set something 2008-09-12 23:37 or upgrade 2008-09-12 23:37 but you probably need to have proper locale 2008-09-12 23:37 oh 2008-09-12 23:37 xchat_2.6.1-0ubuntu2_i386.deb here 2008-09-12 23:37 I have xchat 2.8.4 2008-09-12 23:37 freshly installed 2008-09-12 23:37 I thought unicode was supposed to be independent of locale 2008-09-12 23:38 -!- maze_(~maze@216-239-45-4.google.com) has joined #tux3 2008-09-12 23:38 testing ;-) 2008-09-12 23:38 ???????? 2008-09-12 23:38 nope 2008-09-12 23:38 2.6.8-0.3 2008-09-12 23:38 etch 2008-09-12 23:38 :p 2008-09-12 23:38 try LC_ALL=en_US.utf-8 xchat as the startup command 2008-09-12 23:39 -!- maze__(~maze@216-239-45-4.google.com) has joined #tux3 2008-09-12 23:39 testing 2008-09-12 23:39 ĄĆĘŁŃÓŚŹŻ ąćęłńóśźż 2008-09-12 23:39 yep works 2008-09-12 23:39 it's gotten crowded in here 2008-09-12 23:40 I guess I don't really know what that means... 2008-09-12 23:40 amazed at all the maze clones 2008-09-12 23:40 you better get home 2008-09-12 23:40 I need to hear about spore 2008-09-12 23:40 but obviously something is broken if xchat's wire encoding depends on the locale 2008-09-12 23:41 maybe there's a switch or something 2008-09-12 23:42 I'm giving up on getting xchat to show those chars 2008-09-12 23:42 ah no idea 2008-09-12 23:42 just quit 2008-09-12 23:42 one day I will do a gui for irssi 2008-09-12 23:42 and restart with LC_ALL=en_US.utf-8 xchat 2008-09-12 23:42 keep promising myself 2008-09-12 23:42 decloned. 2008-09-12 23:43 going to try the above? or shall I head home? 2008-09-12 23:44 head home 2008-09-12 23:44 ok will do so then 2008-09-12 23:44 I'll have it working when you get there 2008-09-12 23:44 and you can start your fs 2008-09-12 23:44 or your spore review 2008-09-12 23:45 pick up some ramyun on the way in case it turns into a long one 2008-09-12 23:45 ok, might do so, there's a store on route after all... 2008-09-12 23:48 hey 2008-09-12 23:48 hi 2008-09-12 23:48 yeah, just got some crude gdb script to scan through all system threads and print out their state 2008-09-12 23:48 more to come 2008-09-12 23:49 this is to help with core examinations 2008-09-12 23:49 which it seems that nobody does under Linux 2008-09-12 23:49 true 2008-09-12 23:49 I have never 2008-09-12 23:49 should 2008-09-12 23:50 it's not a criticism, just a new tool to help with debugging 2008-09-12 23:50 I'm getting a bunch of nfsd threads in state 4 which is a bit odd 2008-09-12 23:51 it's about time I learned summa that fu 2008-09-12 23:53 summa ? 2008-09-12 23:53 =some of 2008-09-12 23:53 ok 2008-09-12 23:53 yap 2008-09-12 23:53 that's all we did at netapp practically 2008-09-12 23:53 for better or worse 2008-09-12 23:53 gotta get me summa data 2008-09-12 23:53 gotta get me summa dat