Negative Lookbehinds - Fortune [possible satire]

GordonFreemanhi
rindolfHi GordonFreeman
GordonFreemangrep -Po '(?<=<a )(?<! href=)(?<= href=["]*)[^">]+' <<< '<a gfasg href=asdf>'
GordonFreemangrep: lookbehind assertion is not fixed length
rindolfGordonFreeman: grep is PCRE - it's not Perl.
rindolfperlbot: pcre
AltreusGordonFreeman: don't use regex for HTML
perlbotrindolf: PCRE is not Perl. It lacks several features of Perl regexes. Don't bother asking for help with a PCRE pattern in a Perl channel as the answers will not be relevant. Try #regex, or the channel for your language. See also http://en.wikipedia.org/wiki/PCRE#Differences_from_Perl and LPBD.
GordonFreemanbut this should work i think.
maukeno, it shouldn't
GordonFreemanthough it fails at the second lookbehind ...
maukeno, it doesn't
GordonFreemanand fails at "* too
GordonFreeman(grep -Po '<a +.* +href="*[^" >]+' | grep -Po '(?=<a ).*' | grep -Po '(?<= href=)["]*[^" >]+') <<< '<a gfasg href=asdf><a fgfgg="hi> " href="link" >'
GordonFreemanthis works.
maukeGordonFreeman: dude.
annodon't paste!
GordonFreemanhi mauke
apeironwhere's mauke's car?
rindolfapeiron: :-)
maukeit's a cdr
AltreusI watched that the other day
rindolfpkrumins: what's up?
AltreusI don't really know why
maukeGordonFreeman: go to a channel where that is on-topic
GordonFreemanmauke<< like?
maukeno idea
Altreuswhere on earth is parsing HTML with regexes on topic?
GordonFreemanahem ok
Altreusexcept ##php lolol
GordonFreemanwell i think one can see its logical and it works like this
rindolfGordonFreeman: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
shortenrindolf's url is at http://xrl.us/bf4jh6
apeironGordonFreeman, also, -P isn't perl.
thrigAltreus: some special level of hell, between the angry ghosts and the hungry ghosts
rindolfperlbot: html
apeironthe grep docs lie to you.
perlbotrindolf: Don't parse or modify html with regular expressions! See one of HTML::Parser's subclasses: HTML::TokeParser, HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc. If your response begins "that's overkill. i only want to..." you are wrong. http://en.wikipedia.org/wiki/Chomsky_hierarchy and http://xrl.us/bf4jh6 for why not to use regex on HTML
LeoNerdAltreus: Why, surely in #html-parsing-by-regexp
Altreusif you want perl regex use ack
Altreussurely
rindolfLeoNerd: sounds like programmers' hell.
annoperl regex doesn't support variable-length lookbehind either
Altreusapeiron: actually it says it's highly experimental and hence not working
Altreusit could well be Perl and not PCRE when finished :)
Altreusnot that "perl regex" is a defined term, the speed Perl is moving
yrlnryThat's why you should never use Perl's builtin regexes. Just write your own package, it's sure to be more reliable.
rindolfyrlnry: :-)
talexbHeh.
LeoNerduse re::engine::vim;
rindolfyrlnry++
AltreusLeoNerd: is it core?
yrlnryHOP has a nice implementation. It works by generating a list of every string matched by the regex, and looking to see if your target string is in the list.
LeoNerdI can't help thinking that may not be optimal in terms of CPU or memory usage
talexbyrlnry, no doubt they have a Cray working on generating the list ..
yrlnryLeoNerd: Depends; unlike Perl regexes, it has no trouble handling languages higher up the Chomsky hierarchy
yrlnryIt is guaranteed to return the right answer for any recursive language, and guaranteed to return correct 'matched' answers for any recursively enumerable language.
LeoNerdOh sure...
LeoNerdIn terms of CS guarantees it's very nice
yrlnrySo if you are in a big hurry to get the wrong answer...
LeoNerdBut I live in the practical pragmatic world
LeoNerdE.g. Parser::MGC is horribly slow at backtracking and whatnot, but I write parsers in it because those are still fast for "reasonably" sized inputs, parsers are fast to write, and I like having lots of side-effects and dynamic logic -in- Perl
AltreusUnfortunately my universe doesn't have infinite processing speeds and data storage
annoa universe with infinite processing speed would have processed you by now
Altreusand
Altreuswould have processed my grandchildren too
yrlnryThis algorithm doesn' t need infinite speed or storage.
yrlnryIt works slowly, but finitely.
Altreuswhat
yrlnryThe infinite list is lazily generated and you never have more than one of its elements in memory at any time.
rindolfyrlnry: is it sorted by length?
yrlnryYou will learn this sort of technique after you have been programming in Perl for eight months or so.
Altreushow do you know when it doesn't match
Altreusyrlnry: :D
yrlnryrindolf: it is sorted by length, and lexicographically among strings of the same length.
rindolfyrlnry: ah.
yrlnryOf course, you cannot do the length-sorting thing for arbitrary languages, but for regex languages there is no trouble.
yrlnryhttp://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf
LeoNerdEh..
LeoNerdI dunno. I just dislike purely RE-based parsing
LeoNerdI much prefer code doing it
GordonFreemanwhy can't perl regexp do variable length lookbehind matching?
AltreusSee originally I ignored you because it sounded like you were talking shit
LeoNerdLimit of the implementation
Altreusmainly because it is possible to construct a regex with an infinite range that nevertheless won't match a particular string
annoGordonFreeman: who knows? looks like it's hard to implement with the given engine
maukeGordonFreeman: unclear semantics and no one's bothered to write the code
GordonFreemani see
AltreusPlus, there's a fucking lot of Unicode to create strings out of
LeoNerdIt's not "hard" to implement. It's impossible given the algorithm being used
maukeLeoNerd: why impossible?
yrlnryLeoNerd: I don't think that's true. It could be done using a recursive call to the regex engine now that that is possible.
GordonFreemanbut lookbehind is cool
LeoNerdOooh.. yes.. I suppose it could do that now
GordonFreemanits like a reverse regexp that can be excluded
annovim re's do it
LeoNerdvim uses a different type of engine
annoright
yrlnryAltreus: I was talking shit. After eight months you get a license to do that.
maukereally?
Altreusyrlnry: but there's a pdf
yrlnrywhere's a PDF?
Altreus17:10 < yrlnry> http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf
yrlnryYes.
AltreusI didn't open it or anything
maukeno one opens PDFs
yrlnryPDFs are for cowards and Slavs.
Altreusbut it lent enough credence to your words that I decided to believe your spurious claims
AltreusActually someone did a test the other day
yrlnryOh, does "talking shit" mean "making up nonsense"? Then I was not talking shit.
AltreusHe linked someone to articles supporting his viewpoint and they changed their mind
yrlnryIt is in section 6.5, "regex string generation".
Altreusbut one of the articles was an argument against himself
AltreusShowing that it is enough to cite your sources to be believed; not many people will actually bother to check them
Altreusyrlnry: what do you normally think "talking shit" means?
Altreusare you confusing it with shooting the shit
yrlnryI'm not sure.
Altreusare you foreign
yrlnryYes.
Altreusok then
maukehahaha
Channel#perl
NetworkFreenode
TaglineNegative Lookbehind Regexes for matching HTML
Published2011-11-24