Hi,
I am not experienced in kernel development, so please be
patient.
After exploring the latest (2.19.2) sources it appears that
there is no huge page support for the 32 bit powerpc platform. I deduced it by
starting from 0x300 in head_32.S and comparing notes with head_64.S. It appears
that the only sensible path for hashing in a huge page (on 64bit ppc) is via:
0x300: data_access -> do_hash_page -> hash_page ->
hash_huge_page
Unfortunately, on the 32bit, all paths that do anything
useful end up in create_hpte() found in hash_low_32.S. I noticed someone on
this mailing list claiming huge page support for IBM 44x core… Is it
possible to make it general enough to encompass ppc32 in general?
Another issue I have is the absence of control over hardware
specific attributes of memory such as WIMG. More concretely, I am interested in
having the ability to allocate off the heap in such a way so as to explicitly set the M (coherency) bit off (independently
of SMP or non-SMP mode). This is needed because some multicore PowerPC
platforms (e.g. 745x) perform an extra address broadcast to guarantee cache coherency
per each store miss on a cacheline. This degrades performance for store-bound programs.
I understand that hashing pages as non-cache-coherent makes
data contained therein a potential victim to cache coherency paradoxes.
Nevertheless, since I am working on high performance library, I am prepared to shift
coherency guarantees to the library, which is supposed the one managing the
data flow between memory and CPU caches intelligently.
So, I have 2 main questions:
1) What’s
so special about ppc32 that it didn’t get the matching feature of huge
page support that ppc64 has? Who is responsible/willing to fix it?
2) Is it appropriate
to provide a syscall mechanism (parallel to sys_brk, sys_mmap, and sys_shmget)
to add WIMG settings?
Overall, the vision here is to be able (from user-side, on powerpc32)
to call:
shmid = shmget(2, LENGTH, SHM_HUGETLB
| IPC_CREAT | SHM_R | SHM_W | POWERPC_NONCOHERENT);
shmaddr = shmat(shmid, ADDR,
SHMAT_FLAGS);
And get a segment mapped with wimg=0bxx0x
(actually, I assume all x’s are 0). This would be very nice!
Thank you,
-Ilya
P.S. As a side note, it is pretty difficult to read kernel
sources (especially assembly ones) because of the lack of comments for people
who are not in the kernel hacker “circle.” For example, what in the
whole world is “paca??”