Monday, October 28, 2013

Clock in Wind32

   1: // GetTickCount accuracy is about 10~15ms
2: DWORD start = GetTickCount();
   3: PRINT("Time used for  :%ld ms ", GetTickCount() - start);
   4:
   5: // If need higher accuracy
   6: LARGE_INTEGER lp_s, lp_e, lp;
   7: QueryPerformanceCounter(&lp_s);
   8:    // measured function here ...
   9: QueryPerformanceCounter(&lp_e);
  10: QueryPerformanceFrequency(&lp);
  11: PRINT("Time used for func :%I64d ticks ", lp_e.QuadPart - lp_s.QuadPart);
  12: PRINT("Time used for func :%f sec ", (float)(lp_e.QuadPart - lp_s.QuadPart) / (float)lp.QuadPart);

Monday, October 21, 2013

Byte value in bitstream

00 00 00 01 67 … (SPS)

00 00 00 01 65 … (IDR, I-frame)
00 00 00 01 45 … (IDR, I-frame)
00 00 00 01 25 … (IDR, I-frame)

00 00 00 01 61 … (p-frame)
00 00 00 01 41 … (p-frame)
00 00 00 01 21 … (p-frame)

Thursday, October 10, 2013

Test

This is a test from Writer.

Tuesday, May 21, 2013

Edit history command line in Linux

Sometime I use the following command to search. When I need to search a different pattern later, I used "!find" to run the command, cancel it and modify the last command. That is not a decent method. Two solutions： type: The command line will display but will not execute, then use arrow UP and modify it. Type: then type "find" to localize the command line, and use arrow UP to edit. -- Sometimes you want to replace one parameter with another: Or you know the history number:

Monday, May 13, 2013

new OpenCV book: OpenCV Computer Vision with Python

There are several OpenCV C++ books in the market, this one is Python version. I am really glad that someone wrote a book describing how to implement computer vision algorithms using Python. One of the most difficult part in OpenCV is how to install the library. This book starts with highly detailed installation instructions for Linux, Windows and Mac OS. The author shows readers different I/O operations, such as reading/writing image, video files and accessing cameras. The book also discusses the much needed GUI utilities and APIs. One of the very useful section in the book is how to train haar cascade features in xml files.

Wednesday, April 24, 2013

Find command in Linux

Usually I only use a couple of them, but it's good to list for looking up

Thursday, April 18, 2013

HEVC feature highlights

The High Efficiency Video Coding (HEVC) standard is designed to achieve multiple goals, including coding efﬁciency, ease of transport system integration and data loss resilience, as well as implementability using parallel processing architectures.

• The core of the coding layer in previous standards was the macroblock, containing a 16×16 block of luma samples and, in the usual case of 4:2:0 color sampling, twocorresponding 8×8 blocks of chroma samples; whereasthe analogous structure in HEVC is the coding tree unit (CTU), which has a size selected by the encoder and can be larger than a traditional macroblock. The CTU consists of a luma CTB and the corresponding chroma CTBs and syntax elements. The size L×L of a luma CTB can be chosen as L = 16, 32, or 64 samples, with the larger sizes typically enabling better compression. HEVC then supports a partitioning of the CTBs into smaller blocks using a tree structure and quadtree-like signaling
• Coding units (CUs) and coding blocks (CBs): The quadtree syntax of the CTU speciﬁes the size and positions of its luma and chroma CBs. The root of the quadtree is associated with the CTU. Hence, the size of the luma CTB is the largest supported size for a luma CB. The splitting of a CTU into luma and chroma CBs is signaled jointly. One luma CB and ordinarily two chroma CBs, together with associated syntax, form a coding unit (CU). A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs)
• Prediction units and prediction blocks (PBs): The decision whether to code a picture area using interpicture or intrapicture prediction is made at the CU level. A PU partitioning structure has its root at the CU level. Depending on the basic prediction-type decision, the luma and chroma CBs can then be further split in size and predicted from luma and chroma prediction blocks (PBs). HEVC supports variable PB sizes from 64×64 down to 4×4 samples.
• TUs and transform blocks: The prediction residual is coded using block transforms. A TU tree structure has its root at the CU level. The luma CB residual may be identical to the luma transform block (TB) or may be further split into smaller luma TBs. The same applies to the chroma TBs. Integer basis functions similar to those of a discrete cosine transform (DCT) are deﬁned for the square TB sizes 4×4, 8×8, 16×16, and 32×32. For the 4×4 transform of luma intrapicture prediction residuals, an integer transform derived from a form of discrete sine transform (DST) is alternatively speciﬁed.
• Motion compensation: Quarter-sample precision is used for the MVs, and 7-tap or 8-tap ﬁlters are used for interpolation of fractional-sample positions (compared to six-tap ﬁltering of half-sample positions followed by linear interpolation for quarter-sample positions in H.264/MPEG-4 AVC). Similar to H.264/MPEG-4 AVC,multiple reference pictures are used. For each PB, either one or two motion vectors can be transmitted, resulting either in unipredictive or bipredictive coding, respectively. As in H.264/MPEG-4 AVC, a scaling and offset operation may be applied to the prediction signal(s) in a manner known as weighted prediction
• Intrapicture prediction: The decoded boundary samples
• of adjacent blocks are used as reference data for spatial prediction in regions where interpicture prediction is not performed. Intrapicture prediction supports 33 directional modes (compared to eight such modes in H.264/MPEG-4 AVC), plus planar (surface ﬁtting) and DC (ﬂat) prediction modes. The selected intrapicture prediction modes are encoded by deriving most probable modes (e.g., prediction directions) based on those of previously decoded neighboring PBs
• Quantization control: As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in HEVC, with quantization scaling matrices supported for the various transform block sizes.
• Entropy coding: Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements to improve its throughput speed (especially for parallel-processing architectures) and its compression performance, and to reduce its context memory requirements
• In-loop deblocking ﬁltering: A deblocking ﬁlter similar to the one used in H.264/MPEG-4 AVC is operated within the interpicture prediction loop. However, the design is simpliﬁed in regard to its decision-making and ﬁltering proce
• Sample adaptive offset (SAO): A nonlinear amplitude mapping is introduced within the interpicture prediction loop after the deblocking ﬁlter. Its goal is to better reconstruct the original signal amplitudes by using a look-up table that is described by a few additional parameters that can be determined by histogram analysis at the encoder side.
• Tiles: The option to partition a picture into rectangular regions called tiles has been speciﬁed. The main purpose of tiles is to increase the capability for parallel processing rather than provide error resilience. Tiles are independently decodable regions of a picture that are encoded with some shared header information. Tiles can additionally be used for the purpose of spatial random access to local regions of video pictures. A typical tile conﬁguration of a picture consists of segmenting the picture into rectangular regions with approximately equal numbers of CTUs in each tile. Tiles provide parallelism at a more coarse level of granularity (picture/subpicture), and no sophisticated synchronization of threads is necessary for their use
• Wavefront parallel processing: When wavefront parallel processing (WPP) is enabled, a slice is divided into rows of CTUs. The ﬁrst row is processed in an ordinary way, the second row can begin to be processed after only two CTUs have been processed in the ﬁrst row, the third row can begin to be processed after only two CTUs have been processed in the second row, and so on. The context models of the entropy coder in each row are inferred from those in the preceding row with a two-CTU processing lag. WPP provides a form of processing parallelism at a rather ﬁne level of granularity, i.e., within a slice. WPP may often provide better compression performance than tiles (and avoid some visual artifacts that may be induced by using tiles)

-- All from IEEE papar "overview of HEVC standard"

Wednesday, February 06, 2013

Search in emacs with grep

Search pattern in different directories and different files:

M-x grep RETURN
grep -RnisIw --exclude-dir=bin MPEG2DEC ../

-I suppose to ignore binary files, but for some reason I have to add "--exclude-dir"
../ to search all directories from parent directory. Use ./ to search current directory

If only search all .c files:
grep -RnisIw --exclude-dir=bin --include=*.c MPEG2DEC ../

or exclude all .h files:
grep -RnisIw --exclude=*.h MPEG2DEC ../

M-x rgrep works too, but it has less controls.

Monday, February 04, 2013

touch command

The touch command is the easiest way to create new, empty files.

touch filename1 filename2

Sunday, February 03, 2013

TI C6000 SYS/BIOS (3)

• Debug mode is slow, for example, LDH (load) will follow by a NOP4 (no operation), no pipeline.
• Call function needs 6 cycles, return needs 6 cycles.
• Level of optimization: -o1(local, single block), -o2(function, across blocks), -o3(file, across functions), -pm -o3(program, across files)
• uint32_t is good and the same in different system. 'long', 'int' will be different in different system.
• Keyword 'restrict': tell the compiler that in this scope, no aliasing for this pointer. (aliasing: e.g. one memory location, two ways to access it)
• It is not good to insert "asm(...)" in C codes. If necessary, try to create an asm file and call it in C.
• The results of different optimization methods: Debug, no opt, -g (925k cycles), Release -o2, no -g (33k c), opt (20k), opt with MUST_ITERATE (17k), opt with MUST_ITERATE and restrict (7k), DSPLib (8k)
• Cache has Valid + Tag + Index. Data is cache is reusable, it is good for temporal locality and spatial locality, not good for random number.
• y = \sum a_n x_n, the address of a is 8000 and x is 8010. Trouble here because the last digits are '0', it is the same index.
• Use Direct-Mapped Cache (1-way), associates an address within each block with one cache line, there will be only one unique cache index for any address in the memory-map. Good for L1P (level 1 program). 2-way set associate is good for L1D